数仓4.0
# 数据架构
# 数据生成模块
# 服务器
ha01(192.168.220.201) | ha02(192.168.220.202) | ha03(192.168.220.203) |
---|---|---|
# ssh免密登录
三台机器上执行如下代码:
ssh-keygen -t rsa
ssh-copy-id 192.168.220.201
ssh-copy-id 192.168.220.202
ssh-copy-id 192.168.220.203
2
3
4
5
# 集群分发脚本
在用的家目录/home/damoncai下创建bin文件夹
mkdir bin
1创建脚本文件
cd /home/atguigu/bin vim xsync #!/bin/bash #1. 判断参数个数 if [ $# -lt 1 ] then echo Not Enough Arguement! exit; fi #2. 遍历集群所有机器 for host in 192.168.220.201 192.168.220.202 192.168.220.203 do echo ==================== $host ==================== #3. 遍历所有目录,挨个发送 for file in $@ do #4 判断文件是否存在 if [ -e $file ] then #5. 获取父目录 pdir=$(cd -P $(dirname $file); pwd) #6. 获取当前文件的名称 fname=$(basename $file) ssh $host "mkdir -p $pdir" rsync -av $pdir/$fname $host:$pdir else echo $file does not exists! fi done done
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31修改脚本xsync具有执行权限
chmod +x xsync
1测试脚本
xsync xsync
1
# 环境变量配置说明
Linux的环境变量可在多个文件中配置,如/etc/profile,/etc/profile.d/*.sh,~/.bashrc,~/.bash_profile等,下面说明上述几个文件之间的关系和区别。
bash的运行模式可分为login shell和non-login shell。
例如,我们通过终端,输入用户名、密码,登录系统之后,得到就是一个login shell。而当我们执行以下命令ssh hadoop103 command,在hadoop103执行command的就是一个non-login shell。
这两种shell的主要区别在于,它们启动时会加载不同的配置文件,login shell启动时会加载/etc/profile,~/.bash_profile,~/.bashrc。non-login shell启动时会加载~/.bashrc。
而在加载~/.bashrc(实际是~/.bashrc中加载的/etc/bashrc)或/etc/profile时,都会执行如下代码片段
因此不管是login shell还是non-login shell,启动时都会加载/etc/profile.d/*.sh中的环境变量。
# JDK
卸载现有JDK
sudo rpm -qa | grep -i java | xargs -n1 sudo rpm -e --nodeps
1上传并解压文件
添加环境变量
sudo vim /etc/profile.d/my_env.sh #JAVA_HOME export JAVA_HOME=/opt/module/jdk1.8.0_212 export PATH=$PATH:$JAVA_HOME/bin
1
2
3
4
5让环境变量生效
source /etc/profile.d/my_env.sh
1
# 集群日志生成脚本
ha01和ha02两台机器上运行日志生成脚本
/home/atguigu/bin目录下创建脚本lg.sh
在脚本中编写如下内容
#!/bin/bash for i in ha01 ha02; do echo "========== $i ==========" ssh $i "cd /opt/module/applog/; java -jar gmall2020-mock-log-2021-01-22.jar >/dev/null 2>&1 &" done
1
2
3
4
5注意
/opt/module/applog/为jar包及配置文件所在路径
/dev/null代表Linux的空设备文件,所有往这个文件里面写入的内容都会丢失,俗称“黑洞”。
标准输入0:从键盘获得输入 /proc/self/fd/0
标准输出1:输出到屏幕(即控制台) /proc/self/fd/1
错误输出2:输出到屏幕(即控制台) /proc/self/fd/2
修改脚本执行权限
将jar包及配置文件上传到ha02的/opt/module/applog/路径
启动脚本
查看日志数据
# Log数据采集模块
# 集群所有进程查看脚本
在/home/atguigu/bin目录下创建脚本xcall.sh
编辑脚本内容
#! /bin/bash for i in ha01 ha02 ha03 do echo --------- $i ---------- ssh $i "$*" done
1
2
3
4
5
6
7修改脚本执行权限
启动脚本
xcall.sh jps
1
# Hadoop 安装
服务器ha01 | 服务器ha02 | 服务器ha03 | |
---|---|---|---|
HDFS | NameNode DataNode | DataNode | DataNode SecondaryNameNode |
Yarn | NodeManager | Resourcemanager | NodeManager |
NodeManager |
注意:NameNode和SecondaryNameNode不要安装在同一台服务器
注意:ResourceManager也很消耗内存,不要和NameNode、SecondaryNameNode配置在同一台机器上。
# 完全分布式运行模式(开发重点)
1)准备3台客户机(关闭防火墙、静态IP、主机名称)
2)安装JDK
3)配置环境变量
4)安装Hadoop
5)配置环境变量
6)配置集群
7)单点启动
8)配置ssh
9)群起并测试集群
# 步骤
上传hadoop压缩包并解压
将Hadoop添加到环境变量
vim /etc/profile.d/my_env.sh #HADOOP_HOME export HADOOP_HOME=/opt/module/hadoop-3.1.3 export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin
1
2
3
4
5
6分发环境变量文件
sudo /home/atguigu/bin/xsync /etc/profile.d/my_env.sh
1source一下,使之生效(3台节点)
source /etc/profile.d/my_env.sh
1配置集群
核心配置文件(core-site.xml)
vim core-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- 指定NameNode的地址 --> <property> <name>fs.defaultFS</name> <value>hdfs://ha01:8020</value> </property> <!-- 指定hadoop数据的存储目录 --> <property> <name>hadoop.tmp.dir</name> <value>/opt/module/hadoop-3.1.3/data</value> </property> <!-- 配置HDFS网页登录使用的静态用户为damoncai --> <property> <name>hadoop.http.staticuser.user</name> <value>damoncai</value> </property> <!-- 配置该damoncai(superUser)允许通过代理访问的主机节点 --> <property> <name>hadoop.proxyuser.damoncai.hosts</name> <value>*</value> </property> <!-- 配置该damoncai(superUser)允许通过代理用户所属组 --> <property> <name>hadoop.proxyuser.damoncai.groups</name> <value>*</value> </property> <!-- 配置该damoncai(superUser)允许通过代理的用户--> <property> <name>hadoop.proxyuser.damoncai.users</name> <value>*</value> </property> </configuration>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40HDFS配置文件(hdfs-site.xml)
vim hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- nn web端访问地址--> <property> <name>dfs.namenode.http-address</name> <value>ha01:9870</value> </property> <!-- 2nn web端访问地址--> <property> <name>dfs.namenode.secondary.http-address</name> <value>ha03:9868</value> </property> <!-- 测试环境指定HDFS副本的数量1 --> <property> <name>dfs.replication</name> <value>1</value> </property> </configuration>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25YARN配置文件(yarn-site.xml)
vim yarn-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- 指定MR走shuffle --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <!-- 指定ResourceManager的地址--> <property> <name>yarn.resourcemanager.hostname</name> <value>ha02</value> </property> <!-- 环境变量的继承 --> <property> <name>yarn.nodemanager.env-whitelist</name> <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value> </property> <!-- yarn容器允许分配的最大最小内存 --> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>512</value> </property> <property> <name>yarn.scheduler.maximum-allocation-mb</name> <value>4096</value> </property> <!-- yarn容器允许管理的物理内存大小 --> <property> <name>yarn.nodemanager.resource.memory-mb</name> <value>4096</value> </property> <!-- 关闭yarn对虚拟内存的限制检查 --> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> </configuration>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46MapReduce配置文件(mapred-site.xml)
vim mapred-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <!-- 指定MapReduce程序运行在Yarn上 --> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>
1
2
3
4
5
6
7
8
9
10
11
12配置workers
vim /opt/module/hadoop-3.1.3/etc/hadoop/workers ha01 ha02 ha03
1
2
3
4
5注意:该文件中添加的内容结尾不允许有空格,文件中不允许有空行。
配置历史服务器
mapred-site.xml
<!-- 历史服务器端地址 --> <property> <name>mapreduce.jobhistory.address</name> <value>ha01:10020</value> </property> <!-- 历史服务器web端地址 --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>ha01:19888</value> </property>
1
2
3
4
5
6
7
8
9
10
11
配置日志的聚集
日志聚集概念:应用运行完成以后,将程序运行日志信息上传到HDFS系统上。
日志聚集功能好处:可以方便的查看到程序运行详情,方便开发调试。
注意:开启日志聚集功能,需要重新启动NodeManager 、ResourceManager和HistoryManager。
配置yarn-site.xml
<!-- 开启日志聚集功能 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <!-- 设置日志聚集服务器地址 --> <property> <name>yarn.log.server.url</name> <value>http://ha01:19888/jobhistory/logs</value> </property> <!-- 设置日志保留时间为7天 --> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> </property>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
分发Hadoop
xsync /opt/module/hadoop-3.1.3/
1群起集群
如果集群是第一次启动,需要在ha01节点格式化NameNode(注意格式化之前,一定要先停止上次启动的所有namenode和datanode进程,然后再删除data和log数据)
bin/hdfs namenode -format
1启动HDFS
sbin/start-dfs.sh
1在配置了ResourceManager的节点(ha02)启动YARN
sbin/start-yarn.sh
1Web端查看HDFS的Web页面:http://ha01:9870/
Hadoop群起脚本
来到/home/damoncai/bin目录
编辑脚本
vim hdp.sh
1输入如下内容
#!/bin/bash if [ $# -lt 1 ] then echo "No Args Input..." exit ; fi case $1 in "start") echo " =================== 启动 hadoop集群 ===================" echo " --------------- 启动 hdfs ---------------" ssh ha01 "/opt/module/hadoop-3.1.3/sbin/start-dfs.sh" echo " --------------- 启动 yarn ---------------" ssh ha02 "/opt/module/hadoop-3.1.3/sbin/start-yarn.sh" echo " --------------- 启动 historyserver ---------------" ssh ha01 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver" ;; "stop") echo " =================== 关闭 hadoop集群 ===================" echo " --------------- 关闭 historyserver ---------------" ssh ha01 "/opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver" echo " --------------- 关闭 yarn ---------------" ssh ha02 "/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh" echo " --------------- 关闭 hdfs ---------------" ssh ha01 "/opt/module/hadoop-3.1.3/sbin/stop-dfs.sh" ;; *) echo "Input Args Error..." ;; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
# 项目经验
# 项目经验之HDFS存储多目录
- 给Linux系统新增加一块硬盘
- 生产环境服务器磁盘情况
- 在hdfs-site.xml文件中配置多目录,注意新挂载磁盘的访问权限问题
HDFS的DataNode节点保存数据的路径由dfs.datanode.data.dir参数决定,其默认值为file://${hadoop.tmp.dir}/dfs/data,若服务器有多个磁盘,必须对该 参数进行修改。如服务器磁盘如上图所示,则该参数应修改为如下的值。
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///dfs/data1,file:///hd2/dfs/data2,file:///hd3/dfs/data3,file:///hd4/dfs/data4</value>
</property>
2
3
4
注意:因为每台服务器节点的磁盘情况不同,所以这个配置配完之后,不需要分发
# 集群数据均衡
# 节点数据均衡
开启数据均衡命令
start-balancer.sh -threshold 10
1对于参数10,代表的是集群中各个节点的磁盘空间利用率相差不超过10%,可根据实际情况进行调整。
停止数据均衡命令
stop-balancer.sh
1注意:于HDFS需要启动单独的Rebalance Server来执行Rebalance操作,所以尽量不要在NameNode上执行start-balancer.sh (opens new window),而是找一台比较空闲的机器。
# 磁盘间数据均衡
生成均衡计划(我们只有一块磁盘,不会生成计划)
hdfs diskbalancer -plan ha02
1执行均衡计划
hdfs diskbalancer -execute ha02.plan.json
1查看当前均衡任务的执行情况
hdfs diskbalancer -query ha02
1取消均衡任务
hdfs diskbalancer -cancel ha02.plan.json
1
# 项目经验之支持LZO压缩配置
hadoop-lzo编译
hadoop本身并不支持lzo压缩,故需要使用twitter提供的hadoop-lzo开源组件。hadoop-lzo需依赖hadoop和lzo进行编译,编译步骤如下。
环境准备(通过yum安装即可,yum -y install gcc-c++ lzo-devel zlib-devel autoconf automake libtool)
- maven(下载安装,配置环境变量,修改sitting.xml加阿里云镜像)
- gcc-c++
- zlib-devel
- autoconf
- automake
- libtool
下载、安装并编译LZO
wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.10.tar.gz tar -zxvf lzo-2.10.tar.gz cd lzo-2.10 ./configure -prefix=/usr/local/hadoop/lzo/ make make install
1
2
3
4
5
6
7
8
9
10
11编译hadoop-lzo源码
下载hadoop-lzo的源码,下载地址:https://github.com/twitter/hadoop-lzo/archive/master.zip
解压之后,修改pom.xml
<hadoop.current.version>3.1.3</hadoop.current.version>
1声明两个临时环境变量
export C_INCLUDE_PATH=/usr/local/hadoop/lzo/include export LIBRARY_PATH=/usr/local/hadoop/lzo/lib
1
2编译
进入hadoop-lzo-master,执行maven编译命令 mvn package -Dmaven.test.skip=true
1
2
3进入target,hadoop-lzo-0.4.21-SNAPSHOT.jar 即编译成功的hadoop-lzo组件
将编译好后的hadoop-lzo-0.4.20.jar 放入hadoop-3.1.3/share/hadoop/common/
同步hadoop-lzo-0.4.20.jar到ha02、ha03
xsync hadoop-lzo-0.4.20.jar
1core-site.xml增加配置支持LZO压缩
<configuration> <property> <name>io.compression.codecs</name> <value> org.apache.hadoop.io.compress.GzipCodec, org.apache.hadoop.io.compress.DefaultCodec, org.apache.hadoop.io.compress.BZip2Codec, org.apache.hadoop.io.compress.SnappyCodec, com.hadoop.compression.lzo.LzoCodec, com.hadoop.compression.lzo.LzopCodec </value> </property> <property> <name>io.compression.codec.lzo.class</name> <value>com.hadoop.compression.lzo.LzoCodec</value> </property> </configuration>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18同步core-site.xml到ha02、ha03
xsync core-site.xml
1启动及查看集群
sbin/start-dfs.sh sbin/start-yarn.sh
1
2测试-数据准备
hadoop fs -mkdir /input hadoop fs -put README.txt /input
1
2测试-压缩
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec /input /output
1
# 项目经验之LZO创建索引
创建LZO文件的索引
LZO压缩文件的可切片特性依赖于其索引,故我们需要手动为LZO压缩文件创建索引。若无索引,则LZO文件的切片只有一个。
hadoop jar /path/to/your/hadoop-lzo.jar com.hadoop.compression.lzo.DistributedLzoIndexer big_file.lzo
1测试
将bigtable.lzo(200M)上传到集群的根目录
hadoop fs -mkdir /input hadoop fs -put bigtable.lzo /input
1
2
3执行wordcount程序
hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount -Dmapreduce.job.inputformat.class=com.hadoop.mapreduce.LzoTextInputFormat /input /output1
1对上传的LZO文件建索引
hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /input/bigtable.lzo
1再次执行WordCount程序
hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount -Dmapreduce.job.inputformat.class=com.hadoop.mapreduce.LzoTextInputFormat /input /output2
1注意:如果以上任务,在运行过程中报如下异常
解决办法:在ha01的/opt/module/hadoop-3.1.3/etc/hadoop/yarn-site.xml文件中增加如下配置,然后分发到ha02、ha03服务器上,并重新启动集群。
<!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true --> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property>
1
2
3
4
5
# 项目经验之Hadoop参数调优
# HDFS参数调优hdfs-site.xml
The number of Namenode RPC server threads that listen to requests from clients. If dfs.namenode.servicerpc-address is not configured then Namenode RPC server threads listen to requests from all nodes.
NameNode有一个工作线程池,用来处理不同DataNode的并发心跳以及客户端并发的元数据操作。
对于大集群或者有大量客户端的集群来说,通常需要增大参数dfs.namenode.handler.count的默认值10。
<property>
<name>dfs.namenode.handler.count</name>
<value>21/value>
</property>
2
3
4
5
6
7
可通过简单的python代码计算该值,代码如下
[atguigu@hadoop102 ~]$ python
Python 2.7.5 (default, Apr 11 2018, 07:36:10)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import math
>>> print int(20*math.log(8))
41
>>> quit()
2
3
4
5
6
7
8
# YARN参数调优yarn-site.xml
(1)情景描述:总共7台机器,每天几亿条数据,数据源->Flume->Kafka->HDFS->Hive
面临问题:数据统计主要用HiveSQL,没有数据倾斜,小文件已经做了合并处理,开启的JVM重用,而且IO没有阻塞,内存用了不到50%。但是还是跑的非常慢,而且数据量洪峰过来时,整个集群都会宕掉。基于这种情况有没有优化方案。
(2)解决办法:
NodeManager内存和服务器实际内存配置尽量接近,如服务器有128g内存,但是NodeManager默认内存8G,不修改该参数最多只能用8G内存。NodeManager使用的CPU核数和服务器CPU核数尽量接近。
①yarn.nodemanager.resource.memory-mb NodeManager使用内存数
②yarn.nodemanager.resource.cpu-vcores NodeManager使用CPU核数
# Zookeeper安装
# ZK安装
上传文件并解压
分发zk
xsync zookeeper-3.5.7
1在/opt/module/zookeeper-3.5.7/这个目录下创建zkData
mkdir zkData
1在/opt/module/zookeeper-3.5.7/zkData目录下创建一个myid的文件
vi myid
1在文件中添加与server对应的编号 1
拷贝配置好的zookeeper到其他机器上,修改myid文件中内容为2、3
配置zoo.cfg文件
重命名/opt/module/zookeeper-3.5.7/conf这个目录下的zoo_sample.cfg为zoo.cfg
mv zoo_sample.cfg zoo.cfg
1打开zoo.cfg文件
vim zoo.cfg 修改数据存储路径配置 dataDir=/opt/module/zookeeper-3.5.7/zkData 增加如下配置 #######################cluster########################## server.1=ha01:2888:3888 server.2=ha02:2888:3888 server.3=ha03:2888:3888
1
2
3
4
5
6
7
8
9
10同步zoo.cfg配置文件
xsync zoo.cfg
1配置参数解读
server.A=B:C:D。 A是一个数字,表示这个是第几号服务器; 集群模式下配置一个文件myid,这个文件在dataDir目录下,这个文件里面有一个数据就是A的值,Zookeeper启动时读取此文件,拿到里面的数据与zoo.cfg里面的配置信息比较从而判断到底是哪个server。 B是这个服务器的地址; C是这个服务器Follower与集群中的Leader服务器交换信息的端口; D是万一集群中的Leader服务器挂了,需要一个端口来重新进行选举,选出一个新的Leader,而这个端口就是用来执行选举时服务器相互通信的端口。
1
2
3
4
5
6
7集群操作
bin/zkServer.sh start
1
# ZK集群启动停止脚本
在hadoop102的/home/damoncai/bin目录下创建脚本
vim zk.sh
1在脚本中编写如下内容
#!/bin/bash case $1 in "start"){ for i in ha01 ha02 ha03 do echo ---------- zookeeper $i 启动 ------------ ssh $i "/opt/module/zookeeper-3.5.7/bin/zkServer.sh start" done };; "stop"){ for i in ha01 ha02 ha03 do echo ---------- zookeeper $i 停止 ------------ ssh $i "/opt/module/zookeeper-3.5.7/bin/zkServer.sh stop" done };; "status"){ for i in ha01 ha02 ha03 do echo ---------- zookeeper $i 状态 ------------ ssh $i "/opt/module/zookeeper-3.5.7/bin/zkServer.sh status" done };; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25增加脚本执行权限
chmod u+x zk.sh
1Zookeeper集群启动脚本
zk.sh start|stop|status
1
# Kafka安装
# Kafka集群安装
上传文件并解压
修改解压后的文件名称为kafka
在/opt/module/kafka目录下创建logs文件夹
修改配置文件vi server.properties
修改或者增加以下内容: #broker的全局唯一编号,不能重复 broker.id=0 #删除topic功能使能 delete.topic.enable=true #kafka运行日志存放的路径 log.dirs=/opt/module/kafka/data #配置连接Zookeeper集群地址 zookeeper.connect=ha01:2181,ha02:2181,ha03:2181/kafka
1
2
3
4
5
6
7
8
9配置环境变量
sudo vi /etc/profile.d/my_env.sh #KAFKA_HOME export KAFKA_HOME=/opt/module/kafka export PATH=$PATH:$KAFKA_HOME/bin source /etc/profile.d/my_env.sh
1
2
3
4
5
6
7分发安装包
分别在hadoop103和hadoop104上修改配置文件/opt/module/kafka/config/server.properties中的broker.id=1、broker.id=2
启动集群 每台机器上执行
bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties
1关闭集群
bin/kafka-server-stop.sh
1
# Kafka集群启动停止脚本
在/home/damoncai/bin目录下创建脚本kf.sh
在脚本中填写如下内容
#! /bin/bash case $1 in "start"){ for i in ha01 ha02 ha03 do echo " --------启动 $i Kafka-------" ssh $i "/opt/module/kafka/bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties" done };; "stop"){ for i in ha01 ha02 ha03 do echo " --------停止 $i Kafka-------" ssh $i "/opt/module/kafka/bin/kafka-server-stop.sh stop" done };; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18增加脚本执行权限
chmod u+x kf.sh
1启动
kf.sh start | stop
1
# Kafka常用命令
查看Kafka Topic列表
bin/kafka-topics.sh --zookeeper hadoop102:2181/kafka --list
1创建Kafka Topic
bin/kafka-topics.sh --zookeeper hadoop102:2181,hadoop103:2181,hadoop104:2181/kafka --create --replication-factor 1 --partitions 1 --topic topic_log
1删除Kafka Topic
bin/kafka-topics.sh --delete --zookeeper hadoop102:2181,hadoop103:2181,hadoop104:2181/kafka --topic topic_log
1Kafka生产消息
bin/kafka-console-producer.sh \ --broker-list hadoop102:9092 --topic topic_log >hello world >atguigu atguigu
1
2
3
4Kafka消费消息
bin/kafka-console-consumer.sh \ --bootstrap-server hadoop102:9092 --from-beginning --topic topic_log
1
2查看Kafka Topic详情
bin/kafka-topics.sh --zookeeper hadoop102:2181/kafka \
--describe --topic topic_log
2
# 项目经验之Kafka机器数量计算
Kafka机器数量(经验公式)= 2 *(峰值生产速度 * 副本数 / 100)+ 1
先拿到峰值生产速度,再根据设定的副本数,就能预估出需要部署Kafka的数量。
峰值生产速度
峰值生产速度可以压测得到
副本数
副本数默认是1个,在企业里面2-3个都有,2个居多。
副本多可以提高可靠性,但是会降低网络传输效率。
比如我们的峰值生产速度是50M/s。副本数为2。
Kafka机器数量 = 2 *(50 * 2 / 100)+ 1 = 3台
项目经验之Kafka压力测试
Kafka压测
用Kafka官方自带的脚本,对Kafka进行压测。
kafka-consumer-perf-test.sh
kafka-producer-perf-test.sh
Kafka压测时,在硬盘读写速度一定的情况下,可以查看到哪些地方出现了瓶颈(CPU,内存,网络IO)。一般都是网络IO达到瓶颈。
Kafka Producer压力测试
压测环境准备
ha01、ha02、ha03的网络带宽都设置为100mbps。
关闭hadoop102主机,并根据ha01克隆出ha04(修改IP和主机名称)
ha04的带宽不设限
创建一个test topic,设置为3个分区2个副本
bin/kafka-topics.sh --zookeeper hadoop102:2181,hadoop103:2181,hadoop104:2181/kafka --create --replication-factor 2 --partitions 3 --topic test
1在/opt/module/kafka/bin目录下面有这两个文件。我们来测试一下
bin/kafka-producer-perf-test.sh --topic test --record-size 100 --num-records 10000000 --throughput -1 --producer-props bootstrap.servers=hadoop102:9092,hadoop103:9092,hadoop104:9092
1说明:
record-size是一条信息有多大,单位是字节。
num-records是总共发送多少条信息。
throughput 是每秒多少条信息,设成-1,表示不限流,尽可能快的生产数据,可测出生产者最大吞吐量。
ha01、ha02、ha03三台集群的网络总带宽30m/s左右,由于是两个副本,所以Kafka的吞吐量30m/s ➗ 2(副本) = 15m/s
结论:网络带宽和副本都会影响吞吐量
调整batch.size
batch.size默认值是16k。
batch.size较小,会降低吞吐量。比如说,批次大小为0则完全禁用批处理,会一条一条发送消息);
batch.size过大,会增加消息发送延迟。比如说,Batch设置为64k,但是要等待5秒钟Batch才凑满了64k,才能发送出去。那这条消息的延迟就是5秒钟。
bin/kafka-producer-perf-test.sh --topic test --record-size 100 --num-records 10000000 --throughput -1 --producer-props bootstrap.servers=hadoop102:9092,hadoop103:9092,hadoop104:9092 batch.size=500
1输出结果:
69169 records sent, 13833.8 records/sec (1.32 MB/sec), 2517.6 ms avg latency, 4299.0 ms max latency. 105372 records sent, 21074.4 records/sec (2.01 MB/sec), 6748.4 ms avg latency, 9016.0 ms max latency. 113188 records sent, 22637.6 records/sec (2.16 MB/sec), 11348.0 ms avg latency, 13196.0 ms max latency. 108896 records sent, 21779.2 records/sec (2.08 MB/sec), 12272.6 ms avg latency, 12870.0 ms max latency.
1
2
3
4linger.ms
如果设置batch size为64k,但是比如过了10分钟也没有凑够64k,怎么办?
可以设置,linger.ms。比如linger.ms=5ms,那么就是要发送的数据没有到64k,5ms后,数据也会发出去。
总结
同时设置batch.size和 linger.ms,就是哪个条件先满足就都会将消息发送出去
Kafka需要考虑高吞吐量与延时的平衡。
Kafka Consumer压力测试
Consumer的测试,如果这四个指标(IO,CPU,内存,网络)都不能改变,考虑增加分区数来提升性能。
bin/kafka-consumer-perf-test.sh --broker-list hadoop102:9092,hadoop103:9092,hadoop104:9092 --topic test --fetch-size 10000 --messages 10000000 --threads 1
1--broker-list指定Kafka集群地址
--topic 指定topic的名称
--fetch-size 指定每次fetch的数据的大小
--messages 总共要消费的消息个数
测试结果说明:
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg**, nMsg.sec**
2021-08-03 21:17:21:778, 2021-08-03 21:18:19:775, 514.7169, 8.8749, 5397198, 93059.9514
开始测试时间,测试结束数据,共消费数据514.7169MB,吞吐量8.8749MB/s
调整fetch-size
增加fetch-size值,观察消费吞吐量
bin/kafka-consumer-perf-test.sh --broker-list hadoop102:9092,hadoop103:9092,hadoop104:9092 --topic test --fetch-size 100000 --messages 10000000 --threads 1
1测试结果说明:
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg**, nMsg.sec**
2021-08-03 21:22:57:671, 2021-08-03 21:23:41:938, 514.7169, 11.6276, 5397198, 121923.7355
总结
吞吐量受网络带宽和fetch-size的影响
# 项目经验值Kafka分区数计算
(1)创建一个只有1个分区的topic
(2)测试这个topic的producer吞吐量和consumer吞吐量。
(3)假设他们的值分别是Tp和Tc,单位可以是MB/s。
(4)然后假设总的目标吞吐量是Tt,那么分区数 = Tt / min(Tp,Tc)
例如:producer吞吐量 = 20m/s;consumer吞吐量 = 50m/s,期望吞吐量100m/s;
分区数 = 100 / 20 = 5分区
https://blog.csdn.net/weixin_42641909/article/details/89294698
分区数一般设置为:3-10个
# 采集日志Flume
# Flume安装
集群规划:
服务器ha01 | 服务器ha02 | 服务器ha03 | |
---|---|---|---|
Flume(采集日志) | Flume | Flume |
安装地址
- Flume官网地址:http://flume.apache.org/
- 文档查看地址:http://flume.apache.org/FlumeUserGuide.html
- 下载地址:http://archive.apache.org/dist/flume/
安装部署
上传文件并解压
修改文件夹名称为flume
将lib文件夹下的guava-11.0.2.jar删除以兼容Hadoop 3.1.3
注意:删除guava-11.0.2.jar的服务器节点,一定要配置hadoop环境变量。否则会报如下异常。
将flume/conf下的flume-env.sh.template文件修改为flume-env.sh,并配置flume-env.sh文件
export JAVA_HOME=/opt/module/jdk1.8.0_212
1分发flume
xsync flume
1
# 项目经验之Flume组件选型
Source
Taildir Source相比Exec Source、Spooling Directory Source的优势
TailDir Source:断点续传、多目录。Flume1.6以前需要自己自定义Source记录每次读取文件位置,实现断点续传。不会丢数据,但是有可能会导致数据重复。
Exec Source可以实时搜集数据,但是在Flume不运行或者Shell命令出错的情况下,数据将会丢失。
Spooling Directory Source监控目录,支持断点续传。
batchSize大小如何设置?
Event 1K左右时,500-1000合适(默认为100)
Channel
采用Kafka Channel,省去了Sink,提高了效率。KafkaChannel数据存储在Kafka里面,所以数据是存储在磁盘中。
注意在Flume1.7以前,Kafka Channel很少有人使用,因为发现parseAsFlumeEvent这个配置起不了作用。也就是无论parseAsFlumeEvent配置为true还是 false,都会转为Flume Event。这样的话,造成的结果是,会始终都把Flume的headers中的信息混合着内容一起写入Kafka的消息中,这显然不是我所需要 的,我只是需要把内容写入即可。
# 日志采集Flume配置
Flume配置分析
Flume直接读log日志的数据,log日志的格式是app.yyyy-mm-dd.log。
Flume配置如下
在/opt/module/flume/conf目录下创建file-flume-kafka.conf文件
配置内容
#为各组件命名 a1.sources = r1 a1.channels = c1 #描述source a1.sources.r1.type = TAILDIR a1.sources.r1.filegroups = f1 a1.sources.r1.filegroups.f1 = /opt/module/applog/log/app.* a1.sources.r1.positionFile = /opt/module/flume/taildir_position.json a1.sources.r1.interceptors = i1 a1.sources.r1.interceptors.i1.type = top.damoncai.flume.interceptor.ETLInterceptor$Builder #描述channel a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel a1.channels.c1.kafka.bootstrap.servers = ha01:9092,ha02:9092 a1.channels.c1.kafka.topic = topic_log a1.channels.c1.parseAsFlumeEvent = false #绑定source和channel以及sink和channel的关系 a1.sources.r1.channels = c1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# Flume拦截器
创建工程
导入依赖
<dependencies> <dependency> <groupId>org.apache.flume</groupId> <artifactId>flume-ng-core</artifactId> <version>1.9.0</version> <scope>provided</scope> </dependency> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.62</version> </dependency> </dependencies> <build> <plugins> <plugin> <artifactId>maven-compiler-plugin</artifactId> <version>2.3.2</version> <configuration> <source>1.8</source> <target>1.8</target> </configuration> </plugin> <plugin> <artifactId>maven-assembly-plugin</artifactId> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> <id>make-assembly</id> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44注意:scope中provided的含义是编译时用该jar包。打包时时不用。因为集群上已经存在flume的jar包。只是本地编译时用一下。
在com.atguigu.flume.interceptor包下创建JSONUtils类
package com.atguigu.flume.interceptor; import com.alibaba.fastjson.JSON; import com.alibaba.fastjson.JSONException; public class JSONUtils { public static boolean isJSONValidate(String log){ try { JSON.parse(log); return true; }catch (JSONException e){ return false; } } }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15在com.atguigu.flume.interceptor包下创建LogInterceptor类
package com.atguigu.flume.interceptor; import com.alibaba.fastjson.JSON; import org.apache.flume.Context; import org.apache.flume.Event; import org.apache.flume.interceptor.Interceptor; import java.nio.charset.StandardCharsets; import java.util.Iterator; import java.util.List; public class ETLInterceptor implements Interceptor { @Override public void initialize() { } @Override public Event intercept(Event event) { byte[] body = event.getBody(); String log = new String(body, StandardCharsets.UTF_8); if (JSONUtils.isJSONValidate(log)) { return event; } else { return null; } } @Override public List<Event> intercept(List<Event> list) { Iterator<Event> iterator = list.iterator(); while (iterator.hasNext()){ Event next = iterator.next(); if(intercept(next)==null){ iterator.remove(); } } return list; } public static class Builder implements Interceptor.Builder{ @Override public Interceptor build() { return new ETLInterceptor(); } @Override public void configure(Context context) { } } @Override public void close() { } }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64打包
需要先将打好的包放入到ha01的/opt/module/flume/lib文件夹下面。
分发Flume到ha02、ha03
xsync flume/
1分别在ha02、ha03上启动Flume
bin/flume-ng agent --name a1 --conf-file conf/file-flume-kafka.conf &
1
# 测试Flume-Kafka通道
生成日志
lg.sh
1消费Kafka数据,观察控制台是否有数据获取到
bin/kafka-console-consumer.sh \ --bootstrap-server ha01:9092 --from-beginning --topic topic_log
1
2
# 日志采集Flume启动停止脚本
在/home/atguigu/bin目录下创建脚本f1.sh
填写如下内容
#! /bin/bash case $1 in "start"){ for i in ha01 ha02 do echo " --------启动 $i 采集flume-------" ssh $i "nohup /opt/module/flume/bin/flume-ng agent --conf-file /opt/module/flume/conf/file-flume-kafka.conf --name a1 -Dflume.root.logger=INFO,LOGFILE >/opt/module/flume/log1.txt 2>&1 &" done };; "stop"){ for i in hadoop102 hadoop103 do echo " --------停止 $i 采集flume-------" ssh $i "ps -ef | grep file-flume-kafka | grep -v grep |awk '{print \$2}' | xargs -n1 kill -9 " done };; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19说明1:nohup,该命令可以在你退出帐户/关闭终端之后继续运行相应的进程。nohup就是不挂起的意思,不挂断地运行命令。
说明2:awk 默认分隔符为空格
说明3:$2是在“”双引号内部会被解析为脚本的第二个参数,但是这里面想表达的含义是awk的第二个值,所以需要将他转义,用$2表示。
说明4:xargs 表示取出前面命令运行的结果,作为后面命令的输入参数。
增加脚本执行权限
chmod u+x fl.sh
1f1集群启动 | 停止脚本
f1.sh start | stop
1
# 消费Kafka数据Flume
集群规划
服务器hadoop102 | 服务器hadoop103 | 服务器hadoop104 | |
---|---|---|---|
Flume(消费Kafka) | Flume |
# 项目经验之Flume组件选型
FileChannel和MemoryChannel区别
MemoryChannel传输数据速度更快,但因为数据保存在JVM的堆内存中,Agent进程挂掉会导致数据丢失,适用于对数据质量要求不高的需求。
FileChannel传输速度相对于Memory慢,但数据安全保障高,Agent进程挂掉也可以从失败中恢复数据。
选型:
金融类公司、对钱要求非常准确的公司通常会选择FileChannel
传输的是普通日志信息(京东内部一天丢100万-200万条,这是非常正常的),通常选择MemoryChannel。
FileChannel优化
通过配置dataDirs指向多个路径,每个路径对应不同的硬盘,增大Flume吞吐量。
官方说明如下:
Comma separated list of directories for storing log files. Using multiple directories on separate disks can improve file channel peformance
1checkpointDir和backupCheckpointDir也尽量配置在不同硬盘对应的目录中,保证checkpoint坏掉后,可以快速使用backupCheckpointDir恢复数据。
Sink:HDFS Sink
HDFS存入大量小文件,有什么影响?
**元数据层面:**每个小文件都有一份元数据,其中包括文件路径,文件名,所有者,所属组,权限,创建时间等,这些信息都保存在Namenode内存中。所以小文件过多,会占用Namenode服务器大量内存,影响Namenode性能和使用寿命
**计算层面:**默认情况下MR会对每个小文件启用一个Map任务计算,非常影响计算性能。同时也影响磁盘寻址时间。
HDFS小文件处理
官方默认的这三个参数配置写入HDFS后会产生小文件,hdfs.rollInterval、hdfs.rollSize、hdfs.rollCount
基于以上hdfs.rollInterval=3600,hdfs.rollSize=134217728,hdfs.rollCount =0几个参数综合作用,效果如下:
- 文件在达到128M时会滚动生成新文件
- 文件创建超3600秒时会滚动生成新文件
# 消费者Flume配置
Flume配置分析
Flume的具体配置如下:
在ha03的/opt/module/flume/conf目录下创建kafka-flume-hdfs.conf文件
## 组件 a1.sources=r1 a1.channels=c1 a1.sinks=k1 ## source1 a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource a1.sources.r1.batchSize = 5000 a1.sources.r1.batchDurationMillis = 2000 a1.sources.r1.kafka.bootstrap.servers = ha01:9092,ha02:9092,ha03:9092 a1.sources.r1.kafka.topics=topic_log a1.sources.r1.interceptors = i1 a1.sources.r1.interceptors.i1.type = top.damoncai.flume.interceptor.TimeStampInterceptor$Builder ## channel1 a1.channels.c1.type = file a1.channels.c1.checkpointDir = /opt/module/flume/checkpoint/behavior1 a1.channels.c1.dataDirs = /opt/module/flume/data/behavior1/ ## sink1 a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = /origin_data/gmall/log/topic_log/%Y-%m-%d a1.sinks.k1.hdfs.filePrefix = log- a1.sinks.k1.hdfs.round = false #控制生成的小文件 a1.sinks.k1.hdfs.rollInterval = 10 a1.sinks.k1.hdfs.rollSize = 134217728 a1.sinks.k1.hdfs.rollCount = 0 ## 控制输出文件是原生文件。 a1.sinks.k1.hdfs.fileType = CompressedStream a1.sinks.k1.hdfs.codeC = lzop ## 拼装 a1.sources.r1.channels = c1 a1.sinks.k1.channel= c1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# Flume时间戳拦截器
由于Flume默认会用Linux系统时间,作为输出到HDFS路径的时间。如果数据是23:59分产生的。Flume消费Kafka里面的数据时,有可能已经是第二天了,那么这部门数据会被发往第二天的HDFS路径。我们希望的是根据日志里面的实际时间,发往HDFS的路径,所以下面拦截器作用是获取日志中的实际时间。
解决的思路:拦截json日志,通过fastjson框架解析json,获取实际时间ts。将获取的ts时间写入拦截器header头,header的key必须是timestamp,因为Flume框架会根据这个key的值识别为时间,写入到HDFS。
在top.damoncai.flume.interceptor包下创建TimeStampInterceptor类
public class TimeStampInterceptor implements Interceptor { private ArrayList<Event> events = new ArrayList<>(); @Override public void initialize() { } @Override public Event intercept(Event event) { Map<String, String> headers = event.getHeaders(); String log = new String(event.getBody(), StandardCharsets.UTF_8); JSONObject jsonObject = JSONObject.parseObject(log); String ts = jsonObject.getString("ts"); headers.put("timestamp", ts); return event; } @Override public List<Event> intercept(List<Event> list) { events.clear(); for (Event event : list) { events.add(intercept(event)); } return events; } @Override public void close() { } public static class Builder implements Interceptor.Builder { @Override public Interceptor build() { return new TimeStampInterceptor(); } @Override public void configure(Context context) { } } }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49重新打包
需要先将打好的包放入到ha01的/opt/module/flume/lib文件夹下面。(如果包已存在,将之前的删除)
分发到ha02、ha03
# 消费者Flume启动停止脚本
在ha01服务器 /home/damoncai/bin目录下创建脚本f2.sh
#! /bin/bash case $1 in "start"){ for i in ha03 do echo " --------启动 $i 消费flume-------" ssh $i "nohup /opt/module/flume/bin/flume-ng agent --conf-file /opt/module/flume/conf/kafka-flume-hdfs.conf --name a1 -Dflume.root.logger=INFO,LOGFILE >/opt/module/flume/log2.txt 2>&1 &" done };; "stop"){ for i in ha03 do echo " --------停止 $i 消费flume-------" ssh $i "ps -ef | grep kafka-flume-hdfs | grep -v grep |awk '{print \$2}' | xargs -n1 kill" done };; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19增加脚本执行权限
f2脚本启动 | 停止 消费者Flume
# 项目经验之Flume内存优化
问题描述:如果启动消费者flume抛出如下异常
ERROR hdfs.HDFSEventSink: process failed java.lang.OutOfMemoryError: GC overhead limit exceeded
1
2解决方案
在ha01服务器的/opt/module/flume/conf/flume-env.sh文件中增加如下配置
export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"
1同步配置到ha02、ha03服务器
Flume内存参数设置及优化
JVM heap一般设置为4G或更高
-Xmx与-Xms最好设置一致,减少内存抖动带来的性能影响,如果设置不一致容易导致频繁fullgc。
-Xms表示JVM Heap(堆内存)最小尺寸,初始分配;-Xmx 表示JVM Heap(堆内存)最大允许的尺寸,按需分配。如果不设置一致,容易在初始化时,由于内存不够,频繁触发fullgc。
# 采集通道启动/停止脚本
在/home/damoncai/bin目录下创建脚本cluster.sh
#!/bin/bash case $1 in "start"){ echo ================== 启动 集群 ================== #启动 Zookeeper集群 zk.sh start #启动 Hadoop集群 hdp.sh start #启动 Kafka采集集群 kf.sh start #启动 Flume采集集群 f1.sh start #启动 Flume消费集群 f2.sh start };; "stop"){ echo ================== 停止 集群 ================== #停止 Flume消费集群 f2.sh stop #停止 Flume采集集群 f1.sh stop #停止 Kafka采集集群 kf.sh stop #停止 Hadoop集群 hdp.sh stop #停止 Zookeeper集群 zk.sh stop };; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41添加执行权限
启动|停止脚本
# 常见问题及解决方案
# 页面不能显示完整信息
问题描述
访问2NN页面http://ha03:9868 (opens new window),看不到详细信息
解决方法
在浏览器上按F12,查看问题原因。定位bug在61行
找到要修改的文件
/opt/module/hadoop-3.1.3/share/hadoop/hdfs/webapps/static vim dfs-dust.js :set nu 修改61行 return new Date(Number(v)).toLocaleString();
1
2
3
4
5
6
7分发dfs-dust.js
xsync dfs-dust.js
1在http://ha03:9868/status.html 页面强制刷新
# 业务数据采集模块
# MySQL安装
安装机器 - ha01
# 安装包准备
将安装包和JDBC驱动上传到/opt/software,共计6个
01_mysql-community-common-5.7.16-1.el7.x86_64.rpm 02_mysql-community-libs-5.7.16-1.el7.x86_64.rpm 03_mysql-community-libs-compat-5.7.16-1.el7.x86_64.rpm 04_mysql-community-client-5.7.16-1.el7.x86_64.rpm 05_mysql-community-server-5.7.16-1.el7.x86_64.rpm mysql-connector-java-5.1.27-bin.jar
1
2
3
4
5
6如果是虚拟机按照如下步骤执行
卸载自带的Mysql-libs(如果之前安装过MySQL,要全都卸载掉)
rpm -qa | grep -i -E mysql\|mariadb | xargs -n1 sudo rpm -e --nodeps
1
如果是阿里云服务器按照如下步骤执行
说明:由于阿里云服务器安装的是Linux最小系统版,没有如下工具,所以需要安装
卸载MySQL依赖,虽然机器上没有装MySQL,但是这一步不可少
sudo yum remove mysql-libs
1下载依赖并安装
sudo yum install libaio sudo yum -y install autoconf
1
2
# 安装MySQL
安装MySQL依赖
sudo rpm -ivh 01_mysql-community-common-5.7.16-1.el7.x86_64.rpm sudo rpm -ivh 02_mysql-community-libs-5.7.16-1.el7.x86_64.rpm sudo rpm -ivh 03_mysql-community-libs-compat-5.7.16-1.el7.x86_64.rpm
1
2
3安装mysql-client
sudo rpm -ivh 04_mysql-community-client-5.7.16-1.el7.x86_64.rpm
1安装mysql-server
sudo rpm -ivh 05_mysql-community-server-5.7.16-1.el7.x86_64.rpm
1注意:如果报如下错误,这是由于yum安装了旧版本的GPG keys所造成,从rpm版本4.1后,在安装或升级软件包时会自动检查软件包的签名。
warning: 05_mysql-community-server-5.7.16-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY error: Failed dependencies: libaio.so.1()(64bit) is needed by mysql-community-server-5.7.16-1.el7.x86_64
1
2
3解决办法
sudo rpm -ivh 05_mysql-community-server-5.7.16-1.el7.x86_64.rpm --force --nodeps
1启动MySQL
sudo systemctl start mysqld
1查看MySQL密码
sudo cat /var/log/mysqld.log | grep password
1
# 配置MySQL
配置只要是root用户 + 密码,在任何主机上都能登录MySQL数据库。
用刚刚查到的密码进入MySQL(如果报错,给密码加单引号)
mysql -uroot -p'password'
1设置复杂密码(由于MySQL密码策略,此密码必须足够复杂
set password=password("Qs23=zs32");
1更改MySQL密码策略
set global validate_password_length=4; set global validate_password_policy=0;
1
2设置简单好记的密码
set password=password("000000");
1进入数据库
use mysql
1查询user表
select user, host from user;
1修改user表,把Host表内容修改为%
update user set host="%" where user="root";
1刷新
flush privileges;
1推出
quit;
1
# 生成业务数据
# 链接MySQL
# 导入SQL脚本
# 生成业务数据
在ha01的/opt/module/目录下创建db_log文件夹
把gmall2020-mock-db-2021-01-22.jar和application.properties上传到ha01的/opt/module/db_log路径上。
根据需求修改application.properties相关配置
logging.level.root=info spring.datasource.driver-class-name=com.mysql.jdbc.Driver spring.datasource.url=jdbc:mysql://ha01:3306/gmall?characterEncoding=utf-8&useSSL=false&serverTimezone=GMT%2B8 spring.datasource.username=root spring.datasource.password=000000 logging.pattern.console=%m%n mybatis-plus.global-config.db-config.field-strategy=not_null #业务日期 mock.date=2020-06-14 #是否重置 注意:第一次执行必须设置为1,后续不需要重置不用设置为1 mock.clear=1 #是否重置用户 注意:第一次执行必须设置为1,后续不需要重置不用设置为1 mock.clear.user=1 #生成新用户数量 mock.user.count=100 #男性比例 mock.user.male-rate=20 #用户数据变化概率 mock.user.update-rate:20 #收藏取消比例 mock.favor.cancel-rate=10 #收藏数量 mock.favor.count=100 #每个用户添加购物车的概率 mock.cart.user-rate=50 #每次每个用户最多添加多少种商品进购物车 mock.cart.max-sku-count=8 #每个商品最多买几个 mock.cart.max-sku-num=3 #购物车来源 用户查询,商品推广,智能推荐, 促销活动 mock.cart.source-type-rate=60:20:10:10 #用户下单比例 mock.order.user-rate=50 #用户从购物中购买商品比例 mock.order.sku-rate=50 #是否参加活动 mock.order.join-activity=1 #是否使用购物券 mock.order.use-coupon=1 #购物券领取人数 mock.coupon.user-count=100 #支付比例 mock.payment.rate=70 #支付方式 支付宝:微信 :银联 mock.payment.payment-type=30:60:10 #评价比例 好:中:差:自动 mock.comment.appraise-rate=30:10:10:50 #退款原因比例:质量问题 商品描述与实际描述不一致 缺货 号码不合适 拍错 不想买了 其他 mock.refund.reason-rate=30:10:20:5:15:5:5
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65并在该目录下执行,如下命令,生成2020-06-14日期数据:
java -jar gmall2020-mock-db-2021-01-22.jar
1查看gmall数据库,观察是否有2020-06-14的数据出现
# Sqoop 安装
官网:http://sqoop.apache.org (opens new window)
**下载地址:**http://mirrors.hust.edu.cn/apache/sqoop/1.4.6/
上传安装包sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz到ha01的/opt/software路径中
解压sqoop安装包
进入到/opt/module/sqoop/conf目录,重命名配置文件
mv sqoop-env-template.sh sqoop-env.sh
1修改配置文件
vim sqoop-env.sh
export HADOOP_COMMON_HOME=/opt/module/hadoop-3.1.3 export HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3 export HIVE_HOME=/opt/module/hive export ZOOKEEPER_HOME=/opt/module/zookeeper-3.5.7 export ZOOCFGDIR=/opt/module/zookeeper-3.5.7/conf
1
2
3
4
5拷贝JDBC驱动
将mysql-connector-java-5.1.48.jar 上传到/opt/software路径
cp mysql-connector-java-5.1.48.jar /opt/module/sqoop/lib/
1验证Sqoop
bin/sqoop help
1
测试Sqoop是否能够成功连接数据库
bin/sqoop list-databases --connect jdbc:mysql://ha01:3306/ --username root --password 000000
1Sqoop基本使用
bin/sqoop import \ --connect jdbc:mysql://hadoop102:3306/gmall \ --username root \ --password 000000 \ --table user_info \ --columns id,login_name \ --where "id>=10 and id<=30" \ --target-dir /test \ --delete-target-dir \ --fields-terminated-by '\t' \ --num-mappers 2 \ --split-by id
1
2
3
4
5
6
7
8
9
10
11
12
# 同步策略
数据同步策略的类型包括:全量同步、增量同步、新增及变化同步、特殊情况
- 全量表:存储完整的数据。
- 增量表:存储新增加的数据。
- 新增及变化表:存储新增加的数据和变化的数据。
- 特殊表:只需要存储一次。
# 全量同步策略
# 增量同步策略
# 新增及变化策略
# 特殊策略
某些特殊的表,可不必遵循上述同步策略。例如某些不会发生变化的表(地区表,省份表,民族表)可以只存一份固定值。
# 业务数据导入HDFS
# 分析表同步策略
在生产环境,个别小公司,为了简单处理,所有表全量导入。
中大型公司,由于数据量比较大,还是严格按照同步策略导入数据。
# 业务数据首日同步脚本
# 脚本编写
在/home/damoncai/bin目录下创建
vim mysql_to_hdfs_init.sh
1添加如下内容:
#! /bin/bash APP=gmall sqoop=/opt/module/sqoop/bin/sqoop if [ -n "$2" ] ;then do_date=$2 else echo "请传入日期参数" exit fi import_data(){ $sqoop import \ --connect jdbc:mysql://ha01:3306/$APP \ --username root \ --password 000000 \ --target-dir /origin_data/$APP/db/$1/$do_date \ --delete-target-dir \ --query "$2 where \$CONDITIONS" \ --num-mappers 1 \ --fields-terminated-by '\t' \ --compress \ --compression-codec lzop \ --null-string '\\N' \ --null-non-string '\\N' hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /origin_data/$APP/db/$1/$do_date } import_order_info(){ import_data order_info "select id, total_amount, order_status, user_id, payment_way, delivery_address, out_trade_no, create_time, operate_time, expire_time, tracking_no, province_id, activity_reduce_amount, coupon_reduce_amount, original_total_amount, feight_fee, feight_fee_reduce from order_info" } import_coupon_use(){ import_data coupon_use "select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time, expire_time from coupon_use" } import_order_status_log(){ import_data order_status_log "select id, order_id, order_status, operate_time from order_status_log" } import_user_info(){ import_data "user_info" "select id, login_name, nick_name, name, phone_num, email, user_level, birthday, gender, create_time, operate_time from user_info" } import_order_detail(){ import_data order_detail "select id, order_id, sku_id, sku_name, order_price, sku_num, create_time, source_type, source_id, split_total_amount, split_activity_amount, split_coupon_amount from order_detail" } import_payment_info(){ import_data "payment_info" "select id, out_trade_no, order_id, user_id, payment_type, trade_no, total_amount, subject, payment_status, create_time, callback_time from payment_info" } import_comment_info(){ import_data comment_info "select id, user_id, sku_id, spu_id, order_id, appraise, create_time from comment_info" } import_order_refund_info(){ import_data order_refund_info "select id, user_id, order_id, sku_id, refund_type, refund_num, refund_amount, refund_reason_type, refund_status, create_time from order_refund_info" } import_sku_info(){ import_data sku_info "select id, spu_id, price, sku_name, sku_desc, weight, tm_id, category3_id, is_sale, create_time from sku_info" } import_base_category1(){ import_data "base_category1" "select id, name from base_category1" } import_base_category2(){ import_data "base_category2" "select id, name, category1_id from base_category2" } import_base_category3(){ import_data "base_category3" "select id, name, category2_id from base_category3" } import_base_province(){ import_data base_province "select id, name, region_id, area_code, iso_code, iso_3166_2 from base_province" } import_base_region(){ import_data base_region "select id, region_name from base_region" } import_base_trademark(){ import_data base_trademark "select id, tm_name from base_trademark" } import_spu_info(){ import_data spu_info "select id, spu_name, category3_id, tm_id from spu_info" } import_favor_info(){ import_data favor_info "select id, user_id, sku_id, spu_id, is_cancel, create_time, cancel_time from favor_info" } import_cart_info(){ import_data cart_info "select id, user_id, sku_id, cart_price, sku_num, sku_name, create_time, operate_time, is_ordered, order_time, source_type, source_id from cart_info" } import_coupon_info(){ import_data coupon_info "select id, coupon_name, coupon_type, condition_amount, condition_num, activity_id, benefit_amount, benefit_discount, create_time, range_type, limit_num, taken_count, start_time, end_time, operate_time, expire_time from coupon_info" } import_activity_info(){ import_data activity_info "select id, activity_name, activity_type, start_time, end_time, create_time from activity_info" } import_activity_rule(){ import_data activity_rule "select id, activity_id, activity_type, condition_amount, condition_num, benefit_amount, benefit_discount, benefit_level from activity_rule" } import_base_dic(){ import_data base_dic "select dic_code, dic_name, parent_code, create_time, operate_time from base_dic" } import_order_detail_activity(){ import_data order_detail_activity "select id, order_id, order_detail_id, activity_id, activity_rule_id, sku_id, create_time from order_detail_activity" } import_order_detail_coupon(){ import_data order_detail_coupon "select id, order_id, order_detail_id, coupon_id, coupon_use_id, sku_id, create_time from order_detail_coupon" } import_refund_payment(){ import_data refund_payment "select id, out_trade_no, order_id, sku_id, payment_type, trade_no, total_amount, subject, refund_status, create_time, callback_time from refund_payment" } import_sku_attr_value(){ import_data sku_attr_value "select id, attr_id, value_id, sku_id, attr_name, value_name from sku_attr_value" } import_sku_sale_attr_value(){ import_data sku_sale_attr_value "select id, sku_id, spu_id, sale_attr_value_id, sale_attr_id, sale_attr_name, sale_attr_value_name from sku_sale_attr_value" } case $1 in "order_info") import_order_info ;; "base_category1") import_base_category1 ;; "base_category2") import_base_category2 ;; "base_category3") import_base_category3 ;; "order_detail") import_order_detail ;; "sku_info") import_sku_info ;; "user_info") import_user_info ;; "payment_info") import_payment_info ;; "base_province") import_base_province ;; "base_region") import_base_region ;; "base_trademark") import_base_trademark ;; "activity_info") import_activity_info ;; "cart_info") import_cart_info ;; "comment_info") import_comment_info ;; "coupon_info") import_coupon_info ;; "coupon_use") import_coupon_use ;; "favor_info") import_favor_info ;; "order_refund_info") import_order_refund_info ;; "order_status_log") import_order_status_log ;; "spu_info") import_spu_info ;; "activity_rule") import_activity_rule ;; "base_dic") import_base_dic ;; "order_detail_activity") import_order_detail_activity ;; "order_detail_coupon") import_order_detail_coupon ;; "refund_payment") import_refund_payment ;; "sku_attr_value") import_sku_attr_value ;; "sku_sale_attr_value") import_sku_sale_attr_value ;; "all") import_base_category1 import_base_category2 import_base_category3 import_order_info import_order_detail import_sku_info import_user_info import_payment_info import_base_region import_base_province import_base_trademark import_activity_info import_cart_info import_comment_info import_coupon_use import_coupon_info import_favor_info import_order_refund_info import_order_status_log import_spu_info import_activity_rule import_base_dic import_order_detail_activity import_order_detail_coupon import_refund_payment import_sku_attr_value import_sku_sale_attr_value ;; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488说明1:
[ -n 变量值 ] 判断变量的值,是否为空
-- 变量的值,非空,返回true
-- 变量的值,为空,返回false
说明2:
查看date命令的使用 date --help
增加脚本执行权限
chmod +x mysql_to_hdfs_init.sh
1使用脚本
mysql_to_hdfs_init.sh all 2020-06-14
1
# 业务数据每日同步脚本
脚本编写
在/home/damoncai/bin目录下创建 vim mysql_to_hdfs.sh
1
2添加如下内容
#! /bin/bash APP=gmall sqoop=/opt/module/sqoop/bin/sqoop if [ -n "$2" ] ;then do_date=$2 else do_date=`date -d '-1 day' +%F` fi import_data(){ $sqoop import \ --connect jdbc:mysql://ha01:3306/$APP \ --username root \ --password 000000 \ --target-dir /origin_data/$APP/db/$1/$do_date \ --delete-target-dir \ --query "$2 and \$CONDITIONS" \ --num-mappers 1 \ --fields-terminated-by '\t' \ --compress \ --compression-codec lzop \ --null-string '\\N' \ --null-non-string '\\N' hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /origin_data/$APP/db/$1/$do_date } import_order_info(){ import_data order_info "select id, total_amount, order_status, user_id, payment_way, delivery_address, out_trade_no, create_time, operate_time, expire_time, tracking_no, province_id, activity_reduce_amount, coupon_reduce_amount, original_total_amount, feight_fee, feight_fee_reduce from order_info where (date_format(create_time,'%Y-%m-%d')='$do_date' or date_format(operate_time,'%Y-%m-%d')='$do_date')" } import_coupon_use(){ import_data coupon_use "select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time, expire_time from coupon_use where (date_format(get_time,'%Y-%m-%d')='$do_date' or date_format(using_time,'%Y-%m-%d')='$do_date' or date_format(used_time,'%Y-%m-%d')='$do_date' or date_format(expire_time,'%Y-%m-%d')='$do_date')" } import_order_status_log(){ import_data order_status_log "select id, order_id, order_status, operate_time from order_status_log where date_format(operate_time,'%Y-%m-%d')='$do_date'" } import_user_info(){ import_data "user_info" "select id, login_name, nick_name, name, phone_num, email, user_level, birthday, gender, create_time, operate_time from user_info where (DATE_FORMAT(create_time,'%Y-%m-%d')='$do_date' or DATE_FORMAT(operate_time,'%Y-%m-%d')='$do_date')" } import_order_detail(){ import_data order_detail "select id, order_id, sku_id, sku_name, order_price, sku_num, create_time, source_type, source_id, split_total_amount, split_activity_amount, split_coupon_amount from order_detail where DATE_FORMAT(create_time,'%Y-%m-%d')='$do_date'" } import_payment_info(){ import_data "payment_info" "select id, out_trade_no, order_id, user_id, payment_type, trade_no, total_amount, subject, payment_status, create_time, callback_time from payment_info where (DATE_FORMAT(create_time,'%Y-%m-%d')='$do_date' or DATE_FORMAT(callback_time,'%Y-%m-%d')='$do_date')" } import_comment_info(){ import_data comment_info "select id, user_id, sku_id, spu_id, order_id, appraise, create_time from comment_info where date_format(create_time,'%Y-%m-%d')='$do_date'" } import_order_refund_info(){ import_data order_refund_info "select id, user_id, order_id, sku_id, refund_type, refund_num, refund_amount, refund_reason_type, refund_status, create_time from order_refund_info where date_format(create_time,'%Y-%m-%d')='$do_date'" } import_sku_info(){ import_data sku_info "select id, spu_id, price, sku_name, sku_desc, weight, tm_id, category3_id, is_sale, create_time from sku_info where 1=1" } import_base_category1(){ import_data "base_category1" "select id, name from base_category1 where 1=1" } import_base_category2(){ import_data "base_category2" "select id, name, category1_id from base_category2 where 1=1" } import_base_category3(){ import_data "base_category3" "select id, name, category2_id from base_category3 where 1=1" } import_base_province(){ import_data base_province "select id, name, region_id, area_code, iso_code, iso_3166_2 from base_province where 1=1" } import_base_region(){ import_data base_region "select id, region_name from base_region where 1=1" } import_base_trademark(){ import_data base_trademark "select id, tm_name from base_trademark where 1=1" } import_spu_info(){ import_data spu_info "select id, spu_name, category3_id, tm_id from spu_info where 1=1" } import_favor_info(){ import_data favor_info "select id, user_id, sku_id, spu_id, is_cancel, create_time, cancel_time from favor_info where 1=1" } import_cart_info(){ import_data cart_info "select id, user_id, sku_id, cart_price, sku_num, sku_name, create_time, operate_time, is_ordered, order_time, source_type, source_id from cart_info where 1=1" } import_coupon_info(){ import_data coupon_info "select id, coupon_name, coupon_type, condition_amount, condition_num, activity_id, benefit_amount, benefit_discount, create_time, range_type, limit_num, taken_count, start_time, end_time, operate_time, expire_time from coupon_info where 1=1" } import_activity_info(){ import_data activity_info "select id, activity_name, activity_type, start_time, end_time, create_time from activity_info where 1=1" } import_activity_rule(){ import_data activity_rule "select id, activity_id, activity_type, condition_amount, condition_num, benefit_amount, benefit_discount, benefit_level from activity_rule where 1=1" } import_base_dic(){ import_data base_dic "select dic_code, dic_name, parent_code, create_time, operate_time from base_dic where 1=1" } import_order_detail_activity(){ import_data order_detail_activity "select id, order_id, order_detail_id, activity_id, activity_rule_id, sku_id, create_time from order_detail_activity where date_format(create_time,'%Y-%m-%d')='$do_date'" } import_order_detail_coupon(){ import_data order_detail_coupon "select id, order_id, order_detail_id, coupon_id, coupon_use_id, sku_id, create_time from order_detail_coupon where date_format(create_time,'%Y-%m-%d')='$do_date'" } import_refund_payment(){ import_data refund_payment "select id, out_trade_no, order_id, sku_id, payment_type, trade_no, total_amount, subject, refund_status, create_time, callback_time from refund_payment where (DATE_FORMAT(create_time,'%Y-%m-%d')='$do_date' or DATE_FORMAT(callback_time,'%Y-%m-%d')='$do_date')" } import_sku_attr_value(){ import_data sku_attr_value "select id, attr_id, value_id, sku_id, attr_name, value_name from sku_attr_value where 1=1" } import_sku_sale_attr_value(){ import_data sku_sale_attr_value "select id, sku_id, spu_id, sale_attr_value_id, sale_attr_id, sale_attr_name, sale_attr_value_name from sku_sale_attr_value where 1=1" } case $1 in "order_info") import_order_info ;; "base_category1") import_base_category1 ;; "base_category2") import_base_category2 ;; "base_category3") import_base_category3 ;; "order_detail") import_order_detail ;; "sku_info") import_sku_info ;; "user_info") import_user_info ;; "payment_info") import_payment_info ;; "base_province") import_base_province ;; "activity_info") import_activity_info ;; "cart_info") import_cart_info ;; "comment_info") import_comment_info ;; "coupon_info") import_coupon_info ;; "coupon_use") import_coupon_use ;; "favor_info") import_favor_info ;; "order_refund_info") import_order_refund_info ;; "order_status_log") import_order_status_log ;; "spu_info") import_spu_info ;; "activity_rule") import_activity_rule ;; "base_dic") import_base_dic ;; "order_detail_activity") import_order_detail_activity ;; "order_detail_coupon") import_order_detail_coupon ;; "refund_payment") import_refund_payment ;; "sku_attr_value") import_sku_attr_value ;; "sku_sale_attr_value") import_sku_sale_attr_value ;; "all") import_base_category1 import_base_category2 import_base_category3 import_order_info import_order_detail import_sku_info import_user_info import_payment_info import_base_trademark import_activity_info import_cart_info import_comment_info import_coupon_use import_coupon_info import_favor_info import_order_refund_info import_order_status_log import_spu_info import_activity_rule import_base_dic import_order_detail_activity import_order_detail_coupon import_refund_payment import_sku_attr_value import_sku_sale_attr_value ;; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509增加脚本执行权限
chmod +x mysql_to_hdfs.sh
1脚本使用
mysql_to_hdfs.sh all 2020-06-15
1
# 项目经验
Hive中的Null在底层是以“\N”来存储,而MySQL中的Null在底层就是Null,为了保证数据两端的一致性。在导出数据时采用--input-null-string和--input-null-non-string两个参数。导入数据时采用--null-string和--null-non-string。
# 数据环境准备
# Hive安装部署
把apache-hive-3.1.2-bin.tar.gz上传到Linux的/opt/software目录下
解压apache-hive-3.1.2-bin.tar.gz到/opt/module/目录下面
修改apache-hive-3.1.2-bin.tar.gz的名称为hive
修改/etc/profile.d/my_env.sh,添加环境变量
sudo vim /etc/profile.d/my_env.sh
1#HIVE_HOME export HIVE_HOME=/opt/module/hive export PATH=$PATH:$HIVE_HOME/bin
1
2
3source一下 /etc/profile.d/my_env.sh文件,使环境变量生效
source /etc/profile.d/my_env.sh
1解决日志Jar包冲突,进入/opt/module/hive/lib目录
mv log4j-slf4j-impl-2.10.0.jar log4j-slf4j-impl-2.10.0.jar.bak
1
# Hive元数据配置到MySQL
拷贝驱动 (将MySQL的JDBC驱动拷贝到Hive的lib目录下)
cp /opt/software/mysql-connector-java-5.1.27.jar /opt/module/hive/lib/
1配置Metastore到MySQL
在$HIVE_HOME/conf目录下新建hive-site.xml文件
vim hive-site.xml
1添加如下内容
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://ha01:3306/metastore?useSSL=false</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>000000</value> </property> <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> </property> <property> <name>hive.metastore.schema.verification</name> <value>false</value> </property> <property> <name>hive.server2.thrift.port</name> <value>10000</value> </property> <property> <name>hive.server2.thrift.bind.host</name> <value>ha01</value> </property> <property> <name>hive.metastore.event.db.notification.api.auth</name> <value>false</value> </property> <property> <name>hive.cli.print.header</name> <value>true</value> </property> <property> <name>hive.cli.print.current.db</name> <value>true</value> </property> </configuration>
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
# 启动Hive
登陆MySQL
mysql -uroot -p000000
1新建Hive元数据库
create database metastore;
1初始化Hive元数据库
schematool -initSchema -dbType mysql -verbose
1启动Hive客户端
bin/hive
1查看一下数据库
show databases;
1
# 数仓分层
# 为什么要分层
# 数据集市与数据仓库概念
# 数仓命名规范
# 表命名
ODS层命名为ods_表名
DIM层命名为dim_表名
DWD层命名为dwd_表名
DWS层命名为dws_表名
DWT层命名为dwt_表名
ADS层命名为ads_表名
临时表命名为tmp_表名
# 脚本命名
- 数据源_to_目标_db/log.sh
- 用户行为脚本以log为后缀;业务数据脚本以db为后缀
# 表字段类型
- 数量类型为bigint
- 金额类型为decimal(16, 2),表示:16位有效数字,其中小数部分2位
- 字符串(名字,描述信息等)类型为string
- 主键外键类型为string
- 时间戳类型为bigint
# 数仓理论
# 范式理论
# 范式概念
- 数据建模必须遵循一定的规则,在关系建模中,这种规则就是范式。
- 目的:采用范式,可以降低数据的冗余性
- 范式的缺点是获取数据时,需要通过Join拼接出最后的数据。
- 分类:目前业界范式有:第一范式(1NF)、第二范式(2NF)、第三范式(3NF)、巴斯-科德范式(BCNF)、第四范式(4NF)、第五范式(5NF)
# 函数依赖
# 三范式区分
第一范式
第二范式
第三范式
# 关系建模与维度建模
关系建模和维度建模是两种数据仓库的建模技术。关系建模由Bill Inmon所倡导,维度建模由Ralph Kimball所倡导。
# 关系建模
关系建模将复杂的数据抽象为两个概念——实体和关系,并使用规范化的方式表示出来。关系模型如图所示,从图中可以看出,较为松散、零碎,物理表数量多。
# 维度建模
维度模型如图所示,从图中可以看出,模型相对清晰、简洁。
维度模型以数据分析作为出发点,不遵循三范式,故数据存在一定的冗余。维度模型面向业务,将业务用事实表和维度表呈现出来。表结构简单,故查询简单,查询效率较高。
# 维度表和事实表(重点)
# 维度表
维度表:一般是对事实的描述信息。每一张维表对应现实世界中的一个对象或者概念。 例如:用户、商品、日期、地区等。
维表的特征:
- 维表的范围很宽(具有多个属性、列比较多)
- 跟事实表相比,行数相对较小:通常< 10万条
- 内容相对固定:编码表
# 事实表
事实表中的****每行数据代表一个业务事件(下单、支付、退款、评价等)。“事实”这个术语表示的是业务事件的度量值(可统计次数、个数、金额等),例如,2020年5月21日,宋宋老师在京东花了250块钱买了一瓶海狗人参丸。维度表:时间、用户、商品、商家。事实表:250块钱、一瓶
每一个事实表的行包括:具有可加性的数值型的度量值、与维表相连接的外键,通常具有两个和两个以上的外键。
事实表的特征:
- 非常的大
- 内容相对的窄:列数较少(主要是外键id和度量值)
- 经常发生变化,每天会新增加很多
# 事务型事实表
以每个事务或事件为单位,例如一个销售订单记录,一笔支付记录等,作为事实表里的一行数据。一旦事务被提交,事实表数据被插入,数据就不再进行更改,其更新方式为增量更新。
# 周期型快照事实表
周期型快照事实表中不会保留所有数据,只保留固定时间间隔的数据,例如每天或者每月的销售额,或每月的账户余额等。
例如购物车,有加减商品,随时都有可能变化,但是我们更关心每天结束时这里面有多少商品,方便我们后期统计分析。
# 累积型快照事实表
**累计快照事实表用于跟踪业务事实的变化。**例如,数据仓库中可能需要累积或者存储订单从下订单开始,到订单商品被打包、运输、和签收的各个业务阶段的时间点数据来跟踪订单声明周期的进展情况。当这个业务过程进行时,事实表的记录也要不断更新。
# 维度模型分类
在维度建模的基础上又分为三种模型:星型模型、雪花模型、星座模型。
# 星型模型
# 雪花型
# 星座型
# 数据仓库建模(绝对重点)
# ODS层
- HDFS用户行为数据
- HDFS业务数据
- 针对HDFS上的用户行为数据和业务数据,我们如何规划处理?
- 保持数据原貌不做任何修改,起到备份数据的作用
- 数据采用压缩,减少磁盘存储空间(例如:原始数据100G,可以压缩到10G左右)
- 创建分区表,防止后续的全表扫描
# DIM层和DWD层
DIM层DWD层需构建维度模型,一般采用星型模型,呈现的状态一般为星座模型。
维度建模一般按照以下四个步骤:
选择业务过程→声明粒度→确认维度→确认事实
选择业务过程
在业务系统中,挑选我们感兴趣的业务线,比如下单业务,支付业务,退款业务,物流业务,一条业务线对应一张事实表
声明粒度
数据粒度指数据仓库的数据中保存数据的细化程度或综合程度的级别。
声明粒度意味着精确定义事实表中的一行数据表示什么,应该尽可能选择最小粒度,以此来应各种各样的需求。
典型的粒度声明如下:
订单事实表中一行数据表示的是一个订单中的一个商品项。
支付事实表中一行数据表示的是一个支付记录。
确定维度
维度的主要作用是描述业务是事实,主要表示的是“谁,何处,何时”等信息。
确定维度的原则是:后续需求中是否要分析相关维度的指标。例如,需要统计,什么时间下的订单多,哪个地区下的订单多,哪个用户下的订单多。需要确定的维度就包括:时间维度、地区维度、用户维度。
确定事实
此处的“事实”一词,指的是业务中的度量值(次数、个数、件数、金额,可以进行累加),例如订单金额、下单次数等。
在DWD层,以业务过程为建模驱动,基于每个具体业务过程的特点,构建最细粒度的明细层事实表。事实表可做适当的宽表化处理。
事实表和维度表的关联比较灵活,但是为了应对更复杂的业务需求,可以将能关联上的表尽量关联上。
至此,数据仓库的维度建模已经完毕,DWD层是以业务过程为驱动。
DWS层、DWT层和ADS层都是以需求为驱动,和维度建模已经没有关系了。
DWS和DWT都是建宽表,按照主题去建表。主题相当于观察问题的角度。对应着维度表。
# DWS层与DWT层
DWS层和DWT层统称宽表层,这两层的设计思想大致相同,通过以下案例进行阐述。
问题引出:两个需求,统计每个省份订单的个数、统计每个省份订单的总金额
处理办法:都是将省份表和订单表进行join,group by省份,然后计算。同样数据被计算了两次,实际上类似的场景还会更多。
那怎么设计能避免重复计算呢?
针对上述场景,可以设计一张地区宽表,其主键为地区ID,字段包含为:下单次数、下单金额、支付次数、支付金额等。上述所有指标都统一进行计算,并将结果保存在该宽表中,这样就能有效避免数据的重复计算。
总结:
- 需要建哪些宽表:以维度为基准
- 宽表里面的字段:是站在不同维度的角度去看事实表,重点关注事实表聚合后的度量值。
- DWS和DWT层的区别:DWS层存放的所有主题对象当天的汇总行为,例如每个地区当天的下单次数,下单金额等,DWT层存放的是所有主题对象的累积行为,例如每个地区最近7天(15天、30天、60天)的下单次数、下单金额等。
# ASDS层
对电商系统各大主题指标分别进行分析。
# 数仓环境搭建
# Hive环境搭建
# Hive引擎简介
Hive引擎包括:默认MR、tez、spark
Hive on Spark:Hive既作为存储元数据又负责SQL的解析优化,语法是HQL语法,执行引擎变成了Spark,Spark负责采用RDD执行。
Spark on Hive : Hive只作为存储元数据,Spark负责SQL解析优化,语法是Spark SQL语法,Spark负责采用RDD执行。
# Hive on Spark 配置
兼容性说明
注意:官网下载的Hive3.1.2和Spark3.0.0默认是不兼容的。因为Hive3.1.2支持的Spark版本是2.4.5,所以需要我们重新编译Hive3.1.2版本。
编译步骤:官网下载Hive3.1.2源码,修改pom文件中引用的Spark版本为3.0.0,如果编译通过,直接打包获取jar包。如果报错,就根据提示,修改相关方法,直到不报错,打包获取jar包。
在Hive所在节点部署Spark
如果之前已经部署了Spark,则该步骤可以跳过,但要检查SPARK_HOME的环境变量配置是否正确。
Spark官网下载jar包地址:http://spark.apache.org/downloads.html
上传并解压解压spark-3.0.0-bin-hadoop3.2.tgz
配置SPARK_HOME环境变量
sudo vim /etc/profile.d/my_env.sh
1添加如下内容
# SPARK_HOME export SPARK_HOME=/opt/module/spark export PATH=$PATH:$SPARK_HOME/bin
1
2
3source 使其生效
source /etc/profile.d/my_env.sh
1在hive中创建spark配置文件
vim /opt/module/hive/conf/spark-defaults.conf
1spark.master yarn spark.eventLog.enabled true spark.eventLog.dir hdfs://ha01:8020/spark-history spark.executor.memory 1g spark.driver.memory 1g
1
2
3
4
5在HDFS创建如下路径,用于存储历史日志
hadoop fs -mkdir /spark-history
1向HDFS上传Spark纯净版jar包
说明1:由于Spark3.0.0非纯净版默认支持的是hive2.3.7版本,直接使用会和安装的Hive3.1.2出现兼容性问题。所以采用Spark纯净版jar包,不包含hadoop和hive相关依赖,避免冲突。
说明2:Hive任务最终由Spark来执行,Spark任务资源分配由Yarn来调度,该任务有可能被分配到集群的任何一个节点。所以需要将Spark的依赖上传到HDFS集群路径,这样集群中任何一个节点都能获取到。
上传并解压spark-3.0.0-bin-without-hadoop.tgz
tar -zxvf /opt/software/spark-3.0.0-bin-without-hadoop.tgz
1上传Spark纯净版jar包到HDFS
hadoop fs -mkdir /spark-jars hadoop fs -put spark-3.0.0-bin-without-hadoop/jars/* /spark-jars
1
2
3
修改hive-site.xml文件
vim /opt/module/hive/conf/hive-site.xml
1<!--Spark依赖位置(注意:端口号8020必须和namenode的端口号一致)--> <property> <name>spark.yarn.jars</name> <value>hdfs://ha01:8020/spark-jars/*</value> </property> <!--Hive执行引擎--> <property> <name>hive.execution.engine</name> <value>spark</value> </property>
1
2
3
4
5
6
7
8
9
10
11
# Hive on Spark测试
启动hive客户端
bin/hive
1创建一张测试表
create table student(id int, name string);
1通过insert测试效果
insert into table student values(1,'abc'); # 第一次执行比较慢,由于需要创建spark session原因
1
2
3
# Yarn配置
# 增加ApplicationMaster资源比例
容量调度器对每个资源队列中同时运行的Application Master占用的资源进行了限制,该限制通过yarn.scheduler.capacity.maximum-am-resource-percent参数实现,其默认值是0.1,表示每个资源队列上Application Master最多可使用的资源为该队列总资源的10%,目的是防止大部分资源都被Application Master占用,而导致Map/Reduce Task无法执行。
生产环境该参数可使用默认值。但学习环境,集群资源总数很少,如果只分配10%的资源给Application Master,则可能出现,同一时刻只能运行一个Job的情况,因为一个Application Master使用的资源就可能已经达到10%的上限了。故此处可将该值适当调大。
在ha01的/opt/module/hadoop-3.1.3/etc/hadoop/capacity-scheduler.xml文件中修改如下参数值
vim capacity-scheduler.xml
1<property> <name>yarn.scheduler.capacity.maximum-am-resource-percent</name> <value>0.8</value> </property
1
2
3
4分发capacity-scheduler.xml配置文件
关闭正在运行的任务,重新启动yarn集群
sbin/stop-yarn.sh sbin/start-yarn.sh
1
2
# 数仓开发环境
数仓开发工具可选用DBeaver或者DataGrip。两者都需要用到JDBC协议连接到Hive,故需要启动HiveServer2。
启动HiveServer2
hiveserver2
1配置DataGrip连接
创建链接
配置连接属性
所有属性配置,和Hive的beeline客户端配置一致即可。初次使用,配置过程会提示缺少JDBC驱动,按照提示下载即可
修改连接,指明连接数据库
# 数据准备
一般企业在搭建数仓时,业务系统中会存在一定的历史数据,此处为模拟真实场景,需准备若干历史数据。假定数仓上线的日期为2020-06-14,具体说明如下。
# 用户行为日志
用户行为日志,一般是没有历史数据的,故日志只需要准备2020-06-14一天的数据。具体操作如下:
- 启动日志采集通道,包括Flume、Kafak等
- 修改两个日志服务器(hadoop102、hadoop103)中的/opt/module/applog/application.yml配置文件,将mock.date参数改为2020-06-14。
- 执行日志生成脚本lg.sh。
- 观察HDFS是否出现相应文件。
# 业务数据
业务数据一般存在历史数据,此处需准备2020-06-10至2020-06-14的数据。具体操作如下。
修改ha01节点上的/opt/module/db_log/application.properties文件,将mock.date、mock.clear,mock.clear.user三个参数调整为如图所示的值。
执行模拟生成业务数据的命令,生成第一天2020-06-10的历史数据
java -jar gmall2020-mock-db-2021-01-22.jar
1修改/opt/module/db_log/application.properties文件,将mock.date、mock.clear,mock.clear.user三个参数调整为如图所示的值。
执行模拟生成业务数据的命令,生成第二天2020-06-11的历史数据。
java -jar gmall2020-mock-db-2021-01-22.jar
1之后只修改/opt/module/db_log/application.properties文件中的mock.date参数,依次改为2020-06-12,2020-06-13,2020-06-14,并分别生成对应日期的数据。
执行mysql_to_hdfs_init.sh脚本,将模拟生成的业务数据同步到HDFS
mysql_to_hdfs_init.sh all 2020-06-14
1观察HDFS上是否出现相应的数据
# 数仓搭建-ODS层
- 保持数据原貌不做任何修改,起到备份数据的作用。
- 数据采用LZO压缩,减少磁盘存储空间。100G数据可以压缩到10G以内。
- 创建分区表,防止后续的全表扫描,在企业开发中大量使用分区表。
- 创建外部表。在企业开发中,除了自己用的临时表,创建内部表外,绝大多数场景都是创建外部表。
# ODS层(用户行为数据)
# 创建日志表ods_log
创建支持lzo压缩的分区表
drop table if exists ods_log; CREATE EXTERNAL TABLE ods_log (`line` string) PARTITIONED BY (`dt` string) -- 按照时间创建分区 STORED AS -- 指定存储方式,读数据采用LzoTextInputFormat; INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION '/warehouse/gmall/ods/ods_log' -- 指定数据在hdfs上的存储位置 ;
1
2
3
4
5
6
7
8说明Hive的LZO压缩:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO
分区规划
加载数据
load data inpath '/origin_data/gmall/log/topic_log/2020-06-14' into table ods_log partition(dt='2020-06-14');
1为lzo压缩文件创建索引
hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/gmall/ods/ods_log/dt=2020-06-14
1
# ODS层日志表加载数据脚本
在ha01的/home/damoncai/bin目录下创建脚本
vim hdfs_to_ods_log.sh
1#!/bin/bash # 定义变量方便修改 APP=gmall # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$1" ] ;then do_date=$1 else do_date=`date -d "-1 day" +%F` fi echo ================== 日志日期为 $do_date ================== sql=" load data inpath '/origin_data/$APP/log/topic_log/$do_date' into table ${APP}.ods_log partition(dt='$do_date'); " hive -e "$sql" hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/$APP/ods/ods_log/dt=$do_date
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21说明1:
[ -n 变量值 ] 判断变量的值,是否为空
-- 变量的值,非空,返回true
-- 变量的值,为空,返回false
说明2:
查看date命令的使用,date --help
添加脚本执行权限
chmod 777 hdfs_to_ods_log.sh
1使用脚本
hdfs_to_ods_log.sh 2020-06-14
1查看导入数据
# ODS层(业务数据)
ODS层业务表分区规划如下
ODS层业务表数据装载思路如下
# Hive中创建表
DROP TABLE IF EXISTS ods_activity_info;
CREATE EXTERNAL TABLE ods_activity_info(
`id` STRING COMMENT '编号',
`activity_name` STRING COMMENT '活动名称',
`activity_type` STRING COMMENT '活动类型',
`start_time` STRING COMMENT '开始时间',
`end_time` STRING COMMENT '结束时间',
`create_time` STRING COMMENT '创建时间'
) COMMENT '活动信息表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_activity_info/';
DROP TABLE IF EXISTS ods_activity_rule;
CREATE EXTERNAL TABLE ods_activity_rule(
`id` STRING COMMENT '编号',
`activity_id` STRING COMMENT '活动ID',
`activity_type` STRING COMMENT '活动类型',
`condition_amount` DECIMAL(16,2) COMMENT '满减金额',
`condition_num` BIGINT COMMENT '满减件数',
`benefit_amount` DECIMAL(16,2) COMMENT '优惠金额',
`benefit_discount` DECIMAL(16,2) COMMENT '优惠折扣',
`benefit_level` STRING COMMENT '优惠级别'
) COMMENT '活动规则表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_activity_rule/';
DROP TABLE IF EXISTS ods_base_category1;
CREATE EXTERNAL TABLE ods_base_category1(
`id` STRING COMMENT 'id',
`name` STRING COMMENT '名称'
) COMMENT '商品一级分类表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_category1/';
DROP TABLE IF EXISTS ods_base_category2;
CREATE EXTERNAL TABLE ods_base_category2(
`id` STRING COMMENT ' id',
`name` STRING COMMENT '名称',
`category1_id` STRING COMMENT '一级品类id'
) COMMENT '商品二级分类表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_category2/';
DROP TABLE IF EXISTS ods_base_category3;
CREATE EXTERNAL TABLE ods_base_category3(
`id` STRING COMMENT ' id',
`name` STRING COMMENT '名称',
`category2_id` STRING COMMENT '二级品类id'
) COMMENT '商品三级分类表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_category3/';
DROP TABLE IF EXISTS ods_base_dic;
CREATE EXTERNAL TABLE ods_base_dic(
`dic_code` STRING COMMENT '编号',
`dic_name` STRING COMMENT '编码名称',
`parent_code` STRING COMMENT '父编码',
`create_time` STRING COMMENT '创建日期',
`operate_time` STRING COMMENT '操作日期'
) COMMENT '编码字典表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_dic/';
DROP TABLE IF EXISTS ods_base_province;
CREATE EXTERNAL TABLE ods_base_province (
`id` STRING COMMENT '编号',
`name` STRING COMMENT '省份名称',
`region_id` STRING COMMENT '地区ID',
`area_code` STRING COMMENT '地区编码',
`iso_code` STRING COMMENT 'ISO-3166编码,供可视化使用',
`iso_3166_2` STRING COMMENT 'IOS-3166-2编码,供可视化使用'
) COMMENT '省份表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_province/';
DROP TABLE IF EXISTS ods_base_region;
CREATE EXTERNAL TABLE ods_base_region (
`id` STRING COMMENT '编号',
`region_name` STRING COMMENT '地区名称'
) COMMENT '地区表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_region/';
DROP TABLE IF EXISTS ods_base_trademark;
CREATE EXTERNAL TABLE ods_base_trademark (
`id` STRING COMMENT '编号',
`tm_name` STRING COMMENT '品牌名称'
) COMMENT '品牌表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_trademark/';
DROP TABLE IF EXISTS ods_cart_info;
CREATE EXTERNAL TABLE ods_cart_info(
`id` STRING COMMENT '编号',
`user_id` STRING COMMENT '用户id',
`sku_id` STRING COMMENT 'skuid',
`cart_price` DECIMAL(16,2) COMMENT '放入购物车时价格',
`sku_num` BIGINT COMMENT '数量',
`sku_name` STRING COMMENT 'sku名称 (冗余)',
`create_time` STRING COMMENT '创建时间',
`operate_time` STRING COMMENT '修改时间',
`is_ordered` STRING COMMENT '是否已经下单',
`order_time` STRING COMMENT '下单时间',
`source_type` STRING COMMENT '来源类型',
`source_id` STRING COMMENT '来源编号'
) COMMENT '加购表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_cart_info/';
DROP TABLE IF EXISTS ods_comment_info;
CREATE EXTERNAL TABLE ods_comment_info(
`id` STRING COMMENT '编号',
`user_id` STRING COMMENT '用户ID',
`sku_id` STRING COMMENT '商品sku',
`spu_id` STRING COMMENT '商品spu',
`order_id` STRING COMMENT '订单ID',
`appraise` STRING COMMENT '评价',
`create_time` STRING COMMENT '评价时间'
) COMMENT '商品评论表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_comment_info/';
DROP TABLE IF EXISTS ods_coupon_info;
CREATE EXTERNAL TABLE ods_coupon_info(
`id` STRING COMMENT '购物券编号',
`coupon_name` STRING COMMENT '购物券名称',
`coupon_type` STRING COMMENT '购物券类型 1 现金券 2 折扣券 3 满减券 4 满件打折券',
`condition_amount` DECIMAL(16,2) COMMENT '满额数',
`condition_num` BIGINT COMMENT '满件数',
`activity_id` STRING COMMENT '活动编号',
`benefit_amount` DECIMAL(16,2) COMMENT '减金额',
`benefit_discount` DECIMAL(16,2) COMMENT '折扣',
`create_time` STRING COMMENT '创建时间',
`range_type` STRING COMMENT '范围类型 1、商品 2、品类 3、品牌',
`limit_num` BIGINT COMMENT '最多领用次数',
`taken_count` BIGINT COMMENT '已领用次数',
`start_time` STRING COMMENT '开始领取时间',
`end_time` STRING COMMENT '结束领取时间',
`operate_time` STRING COMMENT '修改时间',
`expire_time` STRING COMMENT '过期时间'
) COMMENT '优惠券表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_coupon_info/';
DROP TABLE IF EXISTS ods_coupon_use;
CREATE EXTERNAL TABLE ods_coupon_use(
`id` STRING COMMENT '编号',
`coupon_id` STRING COMMENT '优惠券ID',
`user_id` STRING COMMENT 'skuid',
`order_id` STRING COMMENT 'spuid',
`coupon_status` STRING COMMENT '优惠券状态',
`get_time` STRING COMMENT '领取时间',
`using_time` STRING COMMENT '使用时间(下单)',
`used_time` STRING COMMENT '使用时间(支付)',
`expire_time` STRING COMMENT '过期时间'
) COMMENT '优惠券领用表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_coupon_use/';
DROP TABLE IF EXISTS ods_favor_info;
CREATE EXTERNAL TABLE ods_favor_info(
`id` STRING COMMENT '编号',
`user_id` STRING COMMENT '用户id',
`sku_id` STRING COMMENT 'skuid',
`spu_id` STRING COMMENT 'spuid',
`is_cancel` STRING COMMENT '是否取消',
`create_time` STRING COMMENT '收藏时间',
`cancel_time` STRING COMMENT '取消时间'
) COMMENT '商品收藏表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_favor_info/';
DROP TABLE IF EXISTS ods_order_detail;
CREATE EXTERNAL TABLE ods_order_detail(
`id` STRING COMMENT '编号',
`order_id` STRING COMMENT '订单号',
`sku_id` STRING COMMENT '商品id',
`sku_name` STRING COMMENT '商品名称',
`order_price` DECIMAL(16,2) COMMENT '商品价格',
`sku_num` BIGINT COMMENT '商品数量',
`create_time` STRING COMMENT '创建时间',
`source_type` STRING COMMENT '来源类型',
`source_id` STRING COMMENT '来源编号',
`split_final_amount` DECIMAL(16,2) COMMENT '分摊最终金额',
`split_activity_amount` DECIMAL(16,2) COMMENT '分摊活动优惠',
`split_coupon_amount` DECIMAL(16,2) COMMENT '分摊优惠券优惠'
) COMMENT '订单详情表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_detail/';
DROP TABLE IF EXISTS ods_order_detail_activity;
CREATE EXTERNAL TABLE ods_order_detail_activity(
`id` STRING COMMENT '编号',
`order_id` STRING COMMENT '订单号',
`order_detail_id` STRING COMMENT '订单明细id',
`activity_id` STRING COMMENT '活动id',
`activity_rule_id` STRING COMMENT '活动规则id',
`sku_id` BIGINT COMMENT '商品id',
`create_time` STRING COMMENT '创建时间'
) COMMENT '订单详情活动关联表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_detail_activity/';
DROP TABLE IF EXISTS ods_order_detail_coupon;
CREATE EXTERNAL TABLE ods_order_detail_coupon(
`id` STRING COMMENT '编号',
`order_id` STRING COMMENT '订单号',
`order_detail_id` STRING COMMENT '订单明细id',
`coupon_id` STRING COMMENT '优惠券id',
`coupon_use_id` STRING COMMENT '优惠券领用记录id',
`sku_id` STRING COMMENT '商品id',
`create_time` STRING COMMENT '创建时间'
) COMMENT '订单详情活动关联表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_detail_coupon/';
DROP TABLE IF EXISTS ods_order_info;
CREATE EXTERNAL TABLE ods_order_info (
`id` STRING COMMENT '订单号',
`final_amount` DECIMAL(16,2) COMMENT '订单最终金额',
`order_status` STRING COMMENT '订单状态',
`user_id` STRING COMMENT '用户id',
`payment_way` STRING COMMENT '支付方式',
`delivery_address` STRING COMMENT '送货地址',
`out_trade_no` STRING COMMENT '支付流水号',
`create_time` STRING COMMENT '创建时间',
`operate_time` STRING COMMENT '操作时间',
`expire_time` STRING COMMENT '过期时间',
`tracking_no` STRING COMMENT '物流单编号',
`province_id` STRING COMMENT '省份ID',
`activity_reduce_amount` DECIMAL(16,2) COMMENT '活动减免金额',
`coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠券减免金额',
`original_amount` DECIMAL(16,2) COMMENT '订单原价金额',
`feight_fee` DECIMAL(16,2) COMMENT '运费',
`feight_fee_reduce` DECIMAL(16,2) COMMENT '运费减免'
) COMMENT '订单表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_info/';
DROP TABLE IF EXISTS ods_order_refund_info;
CREATE EXTERNAL TABLE ods_order_refund_info(
`id` STRING COMMENT '编号',
`user_id` STRING COMMENT '用户ID',
`order_id` STRING COMMENT '订单ID',
`sku_id` STRING COMMENT '商品ID',
`refund_type` STRING COMMENT '退单类型',
`refund_num` BIGINT COMMENT '退单件数',
`refund_amount` DECIMAL(16,2) COMMENT '退单金额',
`refund_reason_type` STRING COMMENT '退单原因类型',
`refund_status` STRING COMMENT '退单状态',--退单状态应包含买家申请、卖家审核、卖家收货、退款完成等状态。此处未涉及到,故该表按增量处理
`create_time` STRING COMMENT '退单时间'
) COMMENT '退单表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_refund_info/';
DROP TABLE IF EXISTS ods_order_status_log;
CREATE EXTERNAL TABLE ods_order_status_log (
`id` STRING COMMENT '编号',
`order_id` STRING COMMENT '订单ID',
`order_status` STRING COMMENT '订单状态',
`operate_time` STRING COMMENT '修改时间'
) COMMENT '订单状态表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_status_log/';
DROP TABLE IF EXISTS ods_payment_info;
CREATE EXTERNAL TABLE ods_payment_info(
`id` STRING COMMENT '编号',
`out_trade_no` STRING COMMENT '对外业务编号',
`order_id` STRING COMMENT '订单编号',
`user_id` STRING COMMENT '用户编号',
`payment_type` STRING COMMENT '支付类型',
`trade_no` STRING COMMENT '交易编号',
`payment_amount` DECIMAL(16,2) COMMENT '支付金额',
`subject` STRING COMMENT '交易内容',
`payment_status` STRING COMMENT '支付状态',
`create_time` STRING COMMENT '创建时间',
`callback_time` STRING COMMENT '回调时间'
) COMMENT '支付流水表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_payment_info/';
DROP TABLE IF EXISTS ods_refund_payment;
CREATE EXTERNAL TABLE ods_refund_payment(
`id` STRING COMMENT '编号',
`out_trade_no` STRING COMMENT '对外业务编号',
`order_id` STRING COMMENT '订单编号',
`sku_id` STRING COMMENT 'SKU编号',
`payment_type` STRING COMMENT '支付类型',
`trade_no` STRING COMMENT '交易编号',
`refund_amount` DECIMAL(16,2) COMMENT '支付金额',
`subject` STRING COMMENT '交易内容',
`refund_status` STRING COMMENT '支付状态',
`create_time` STRING COMMENT '创建时间',
`callback_time` STRING COMMENT '回调时间'
) COMMENT '支付流水表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_refund_payment/';
DROP TABLE IF EXISTS ods_sku_attr_value;
CREATE EXTERNAL TABLE ods_sku_attr_value(
`id` STRING COMMENT '编号',
`attr_id` STRING COMMENT '平台属性ID',
`value_id` STRING COMMENT '平台属性值ID',
`sku_id` STRING COMMENT '商品ID',
`attr_name` STRING COMMENT '平台属性名称',
`value_name` STRING COMMENT '平台属性值名称'
) COMMENT 'sku平台属性表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_sku_attr_value/';
DROP TABLE IF EXISTS ods_sku_info;
CREATE EXTERNAL TABLE ods_sku_info(
`id` STRING COMMENT 'skuId',
`spu_id` STRING COMMENT 'spuid',
`price` DECIMAL(16,2) COMMENT '价格',
`sku_name` STRING COMMENT '商品名称',
`sku_desc` STRING COMMENT '商品描述',
`weight` DECIMAL(16,2) COMMENT '重量',
`tm_id` STRING COMMENT '品牌id',
`category3_id` STRING COMMENT '品类id',
`is_sale` STRING COMMENT '是否在售',
`create_time` STRING COMMENT '创建时间'
) COMMENT 'SKU商品表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_sku_info/';
DROP TABLE IF EXISTS ods_sku_sale_attr_value;
CREATE EXTERNAL TABLE ods_sku_sale_attr_value(
`id` STRING COMMENT '编号',
`sku_id` STRING COMMENT 'sku_id',
`spu_id` STRING COMMENT 'spu_id',
`sale_attr_value_id` STRING COMMENT '销售属性值id',
`sale_attr_id` STRING COMMENT '销售属性id',
`sale_attr_name` STRING COMMENT '销售属性名称',
`sale_attr_value_name` STRING COMMENT '销售属性值名称'
) COMMENT 'sku销售属性名称'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_sku_sale_attr_value/';
DROP TABLE IF EXISTS ods_spu_info;
CREATE EXTERNAL TABLE ods_spu_info(
`id` STRING COMMENT 'spuid',
`spu_name` STRING COMMENT 'spu名称',
`category3_id` STRING COMMENT '品类id',
`tm_id` STRING COMMENT '品牌id'
) COMMENT 'SPU商品表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_spu_info/';
DROP TABLE IF EXISTS ods_user_info;
CREATE EXTERNAL TABLE ods_user_info(
`id` STRING COMMENT '用户id',
`login_name` STRING COMMENT '用户名称',
`nick_name` STRING COMMENT '用户昵称',
`name` STRING COMMENT '用户姓名',
`phone_num` STRING COMMENT '手机号码',
`email` STRING COMMENT '邮箱',
`user_level` STRING COMMENT '用户等级',
`birthday` STRING COMMENT '生日',
`gender` STRING COMMENT '性别',
`create_time` STRING COMMENT '创建时间',
`operate_time` STRING COMMENT '操作时间'
) COMMENT '用户表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_user_info/';
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
# ODS层业务表首日数据装载脚本
编写脚本
在/home/damoncai/bin目录下创建脚本hdfs_to_ods_db_init.sh
vim hdfs_to_ods_db_init.sh
1#!/bin/bash APP=gmall if [ -n "$2" ] ;then do_date=$2 else echo "请传入日期参数" exit fi ods_order_info=" load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date');" ods_order_detail=" load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date');" ods_sku_info=" load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date');" ods_user_info=" load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date');" ods_payment_info=" load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date');" ods_base_category1=" load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date');" ods_base_category2=" load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date');" ods_base_category3=" load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date'); " ods_base_trademark=" load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date'); " ods_activity_info=" load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date'); " ods_cart_info=" load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date'); " ods_comment_info=" load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date'); " ods_coupon_info=" load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date'); " ods_coupon_use=" load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date'); " ods_favor_info=" load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date'); " ods_order_refund_info=" load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date'); " ods_order_status_log=" load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date'); " ods_spu_info=" load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date'); " ods_activity_rule=" load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date');" ods_base_dic=" load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date'); " ods_order_detail_activity=" load data inpath '/origin_data/$APP/db/order_detail_activity/$do_date' OVERWRITE into table ${APP}.ods_order_detail_activity partition(dt='$do_date'); " ods_order_detail_coupon=" load data inpath '/origin_data/$APP/db/order_detail_coupon/$do_date' OVERWRITE into table ${APP}.ods_order_detail_coupon partition(dt='$do_date'); " ods_refund_payment=" load data inpath '/origin_data/$APP/db/refund_payment/$do_date' OVERWRITE into table ${APP}.ods_refund_payment partition(dt='$do_date'); " ods_sku_attr_value=" load data inpath '/origin_data/$APP/db/sku_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_attr_value partition(dt='$do_date'); " ods_sku_sale_attr_value=" load data inpath '/origin_data/$APP/db/sku_sale_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_sale_attr_value partition(dt='$do_date'); " ods_base_province=" load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province;" ods_base_region=" load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region;" case $1 in "ods_order_info"){ hive -e "$ods_order_info" };; "ods_order_detail"){ hive -e "$ods_order_detail" };; "ods_sku_info"){ hive -e "$ods_sku_info" };; "ods_user_info"){ hive -e "$ods_user_info" };; "ods_payment_info"){ hive -e "$ods_payment_info" };; "ods_base_category1"){ hive -e "$ods_base_category1" };; "ods_base_category2"){ hive -e "$ods_base_category2" };; "ods_base_category3"){ hive -e "$ods_base_category3" };; "ods_base_trademark"){ hive -e "$ods_base_trademark" };; "ods_activity_info"){ hive -e "$ods_activity_info" };; "ods_cart_info"){ hive -e "$ods_cart_info" };; "ods_comment_info"){ hive -e "$ods_comment_info" };; "ods_coupon_info"){ hive -e "$ods_coupon_info" };; "ods_coupon_use"){ hive -e "$ods_coupon_use" };; "ods_favor_info"){ hive -e "$ods_favor_info" };; "ods_order_refund_info"){ hive -e "$ods_order_refund_info" };; "ods_order_status_log"){ hive -e "$ods_order_status_log" };; "ods_spu_info"){ hive -e "$ods_spu_info" };; "ods_activity_rule"){ hive -e "$ods_activity_rule" };; "ods_base_dic"){ hive -e "$ods_base_dic" };; "ods_order_detail_activity"){ hive -e "$ods_order_detail_activity" };; "ods_order_detail_coupon"){ hive -e "$ods_order_detail_coupon" };; "ods_refund_payment"){ hive -e "$ods_refund_payment" };; "ods_sku_attr_value"){ hive -e "$ods_sku_attr_value" };; "ods_sku_sale_attr_value"){ hive -e "$ods_sku_sale_attr_value" };; "ods_base_province"){ hive -e "$ods_base_province" };; "ods_base_region"){ hive -e "$ods_base_region" };; "all"){ hive -e "$ods_order_info$ods_order_detail$ods_sku_info$ods_user_info$ods_payment_info$ods_base_category1$ods_base_category2$ods_base_category3$ods_base_trademark$ods_activity_info$ods_cart_info$ods_comment_info$ods_coupon_info$ods_coupon_use$ods_favor_info$ods_order_refund_info$ods_order_status_log$ods_spu_info$ods_activity_rule$ods_base_dic$ods_order_detail_activity$ods_order_detail_coupon$ods_refund_payment$ods_sku_attr_value$ods_sku_sale_attr_value$ods_base_province$ods_base_region" };; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178增加执行权限
执行脚本
hdfs_to_ods_db_init.sh all 2020-06-14
1
# ODS层业务表每日数据装载脚本
编写脚本
在/home/damoncai/bin目录下创建脚本hdfs_to_ods_db.sh
vim hdfs_to_ods_db.sh
1#!/bin/bash APP=gmall # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$2" ] ;then do_date=$2 else do_date=`date -d "-1 day" +%F` fi ods_order_info=" load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date');" ods_order_detail=" load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date');" ods_sku_info=" load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date');" ods_user_info=" load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date');" ods_payment_info=" load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date');" ods_base_category1=" load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date');" ods_base_category2=" load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date');" ods_base_category3=" load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date'); " ods_base_trademark=" load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date'); " ods_activity_info=" load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date'); " ods_cart_info=" load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date'); " ods_comment_info=" load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date'); " ods_coupon_info=" load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date'); " ods_coupon_use=" load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date'); " ods_favor_info=" load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date'); " ods_order_refund_info=" load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date'); " ods_order_status_log=" load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date'); " ods_spu_info=" load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date'); " ods_activity_rule=" load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date');" ods_base_dic=" load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date'); " ods_order_detail_activity=" load data inpath '/origin_data/$APP/db/order_detail_activity/$do_date' OVERWRITE into table ${APP}.ods_order_detail_activity partition(dt='$do_date'); " ods_order_detail_coupon=" load data inpath '/origin_data/$APP/db/order_detail_coupon/$do_date' OVERWRITE into table ${APP}.ods_order_detail_coupon partition(dt='$do_date'); " ods_refund_payment=" load data inpath '/origin_data/$APP/db/refund_payment/$do_date' OVERWRITE into table ${APP}.ods_refund_payment partition(dt='$do_date'); " ods_sku_attr_value=" load data inpath '/origin_data/$APP/db/sku_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_attr_value partition(dt='$do_date'); " ods_sku_sale_attr_value=" load data inpath '/origin_data/$APP/db/sku_sale_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_sale_attr_value partition(dt='$do_date'); " ods_base_province=" load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province;" ods_base_region=" load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region;" case $1 in "ods_order_info"){ hive -e "$ods_order_info" };; "ods_order_detail"){ hive -e "$ods_order_detail" };; "ods_sku_info"){ hive -e "$ods_sku_info" };; "ods_user_info"){ hive -e "$ods_user_info" };; "ods_payment_info"){ hive -e "$ods_payment_info" };; "ods_base_category1"){ hive -e "$ods_base_category1" };; "ods_base_category2"){ hive -e "$ods_base_category2" };; "ods_base_category3"){ hive -e "$ods_base_category3" };; "ods_base_trademark"){ hive -e "$ods_base_trademark" };; "ods_activity_info"){ hive -e "$ods_activity_info" };; "ods_cart_info"){ hive -e "$ods_cart_info" };; "ods_comment_info"){ hive -e "$ods_comment_info" };; "ods_coupon_info"){ hive -e "$ods_coupon_info" };; "ods_coupon_use"){ hive -e "$ods_coupon_use" };; "ods_favor_info"){ hive -e "$ods_favor_info" };; "ods_order_refund_info"){ hive -e "$ods_order_refund_info" };; "ods_order_status_log"){ hive -e "$ods_order_status_log" };; "ods_spu_info"){ hive -e "$ods_spu_info" };; "ods_activity_rule"){ hive -e "$ods_activity_rule" };; "ods_base_dic"){ hive -e "$ods_base_dic" };; "ods_order_detail_activity"){ hive -e "$ods_order_detail_activity" };; "ods_order_detail_coupon"){ hive -e "$ods_order_detail_coupon" };; "ods_refund_payment"){ hive -e "$ods_refund_payment" };; "ods_sku_attr_value"){ hive -e "$ods_sku_attr_value" };; "ods_sku_sale_attr_value"){ hive -e "$ods_sku_sale_attr_value" };; "all"){ hive -e "$ods_order_info$ods_order_detail$ods_sku_info$ods_user_info$ods_payment_info$ods_base_category1$ods_base_category2$ods_base_category3$ods_base_trademark$ods_activity_info$ods_cart_info$ods_comment_info$ods_coupon_info$ods_coupon_use$ods_favor_info$ods_order_refund_info$ods_order_status_log$ods_spu_info$ods_activity_rule$ods_base_dic$ods_order_detail_activity$ods_order_detail_coupon$ods_refund_payment$ods_sku_attr_value$ods_sku_sale_attr_value" };; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172修改权限
执行脚本
hdfs_to_ods_db.sh all 2020-06-14
1查看数据是否导入成功
# 数仓搭建-DIM层
# 商品维度表(全量)
建表语句
DROP TABLE IF EXISTS dim_sku_info; CREATE EXTERNAL TABLE dim_sku_info ( `id` STRING COMMENT '商品id', `price` DECIMAL(16,2) COMMENT '商品价格', `sku_name` STRING COMMENT '商品名称', `sku_desc` STRING COMMENT '商品描述', `weight` DECIMAL(16,2) COMMENT '重量', `is_sale` BOOLEAN COMMENT '是否在售', `spu_id` STRING COMMENT 'spu编号', `spu_name` STRING COMMENT 'spu名称', `category3_id` STRING COMMENT '三级分类id', `category3_name` STRING COMMENT '三级分类名称', `category2_id` STRING COMMENT '二级分类id', `category2_name` STRING COMMENT '二级分类名称', `category1_id` STRING COMMENT '一级分类id', `category1_name` STRING COMMENT '一级分类名称', `tm_id` STRING COMMENT '品牌id', `tm_name` STRING COMMENT '品牌名称', `sku_attr_values` ARRAY<STRUCT<attr_id:STRING,value_id:STRING,attr_name:STRING,value_name:STRING>> COMMENT '平台属性', `sku_sale_attr_values` ARRAY<STRUCT<sale_attr_id:STRING,sale_attr_value_id:STRING,sale_attr_name:STRING,sale_attr_value_name:STRING>> COMMENT '销售属性', `create_time` STRING COMMENT '创建时间' ) COMMENT '商品维度表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dim/dim_sku_info/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26分区规划
数据装载
Hive读取索引文件问题
两种方式,分别查询数据有多少行
select * from ods_log; Time taken: 0.706 seconds, Fetched: 2955 row(s) hive (gmall)> select count(*) from ods_log; 2959
1
2
3
4
5两次查询结果不一致
原因是select * from ods_log不执行MR操作,直接采用的是ods_log建表语句中指定的DeprecatedLzoTextInputFormat,能够识别lzo.index为索引文件。
select count(*) from ods_log执行MR操作,会先经过hive.input.format,其默认值为CombineHiveInputFormat,其会先将索引文件当成小文件合并,将其当做普通文件处理。更严重的是,这会导致LZO文件无法切片。
解决办法:修改CombineHiveInputFormat为HiveInputFormat
set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
1
首日装载
with sku as ( select id, price, sku_name, sku_desc, weight, is_sale, spu_id, category3_id, tm_id, create_time from ods_sku_info where dt='2020-06-14' ), spu as ( select id, spu_name from ods_spu_info where dt='2020-06-14' ), c3 as ( select id, name, category2_id from ods_base_category3 where dt='2020-06-14' ), c2 as ( select id, name, category1_id from ods_base_category2 where dt='2020-06-14' ), c1 as ( select id, name from ods_base_category1 where dt='2020-06-14' ), tm as ( select id, tm_name from ods_base_trademark where dt='2020-06-14' ), attr as ( select sku_id, collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs from ods_sku_attr_value where dt='2020-06-14' group by sku_id ), sale_attr as ( select sku_id, collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs from ods_sku_sale_attr_value where dt='2020-06-14' group by sku_id ) insert overwrite table dim_sku_info partition(dt='2020-06-14') select sku.id, sku.price, sku.sku_name, sku.sku_desc, sku.weight, sku.is_sale, sku.spu_id, spu.spu_name, sku.category3_id, c3.name, c3.category2_id, c2.name, c2.category1_id, c1.name, sku.tm_id, tm.tm_name, attr.attrs, sale_attr.sale_attrs, sku.create_time from sku left join spu on sku.spu_id=spu.id left join c3 on sku.category3_id=c3.id left join c2 on c3.category2_id=c2.id left join c1 on c2.category1_id=c1.id left join tm on sku.tm_id=tm.id left join attr on sku.id=attr.sku_id left join sale_attr on sku.id=sale_attr.sku_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106每日装载
with sku as ( select id, price, sku_name, sku_desc, weight, is_sale, spu_id, category3_id, tm_id, create_time from ods_sku_info where dt='2020-06-15' ), spu as ( select id, spu_name from ods_spu_info where dt='2020-06-15' ), c3 as ( select id, name, category2_id from ods_base_category3 where dt='2020-06-15' ), c2 as ( select id, name, category1_id from ods_base_category2 where dt='2020-06-15' ), c1 as ( select id, name from ods_base_category1 where dt='2020-06-15' ), tm as ( select id, tm_name from ods_base_trademark where dt='2020-06-15' ), attr as ( select sku_id, collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs from ods_sku_attr_value where dt='2020-06-15' group by sku_id ), sale_attr as ( select sku_id, collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs from ods_sku_sale_attr_value where dt='2020-06-15' group by sku_id ) insert overwrite table dim_sku_info partition(dt='2020-06-15') select sku.id, sku.price, sku.sku_name, sku.sku_desc, sku.weight, sku.is_sale, sku.spu_id, spu.spu_name, sku.category3_id, c3.name, c3.category2_id, c2.name, c2.category1_id, c1.name, sku.tm_id, tm.tm_name, attr.attrs, sale_attr.sale_attrs, sku.create_time from sku left join spu on sku.spu_id=spu.id left join c3 on sku.category3_id=c3.id left join c2 on c3.category2_id=c2.id left join c1 on c2.category1_id=c1.id left join tm on sku.tm_id=tm.id left join attr on sku.id=attr.sku_id left join sale_attr on sku.id=sale_attr.sku_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
# 优惠券维度表(全量)
建表语句
DROP TABLE IF EXISTS dim_coupon_info; CREATE EXTERNAL TABLE dim_coupon_info( `id` STRING COMMENT '购物券编号', `coupon_name` STRING COMMENT '购物券名称', `coupon_type` STRING COMMENT '购物券类型 1 现金券 2 折扣券 3 满减券 4 满件打折券', `condition_amount` DECIMAL(16,2) COMMENT '满额数', `condition_num` BIGINT COMMENT '满件数', `activity_id` STRING COMMENT '活动编号', `benefit_amount` DECIMAL(16,2) COMMENT '减金额', `benefit_discount` DECIMAL(16,2) COMMENT '折扣', `create_time` STRING COMMENT '创建时间', `range_type` STRING COMMENT '范围类型 1、商品 2、品类 3、品牌', `limit_num` BIGINT COMMENT '最多领取次数', `taken_count` BIGINT COMMENT '已领取次数', `start_time` STRING COMMENT '可以领取的开始日期', `end_time` STRING COMMENT '可以领取的结束日期', `operate_time` STRING COMMENT '修改时间', `expire_time` STRING COMMENT '过期时间' ) COMMENT '优惠券维度表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dim/dim_coupon_info/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23分区规划
数据装载
首日加载
insert overwrite table dim_coupon_info partition(dt='2020-06-14') select id, coupon_name, coupon_type, condition_amount, condition_num, activity_id, benefit_amount, benefit_discount, create_time, range_type, limit_num, taken_count, start_time, end_time, operate_time, expire_time from ods_coupon_info where dt='2020-06-14';
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20每日加载
insert overwrite table dim_coupon_info partition(dt='2020-06-15') select id, coupon_name, coupon_type, condition_amount, condition_num, activity_id, benefit_amount, benefit_discount, create_time, range_type, limit_num, taken_count, start_time, end_time, operate_time, expire_time from ods_coupon_info where dt='2020-06-15';
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
# 活动维度表(全量)
建表语句
DROP TABLE IF EXISTS dim_activity_rule_info; CREATE EXTERNAL TABLE dim_activity_rule_info( `activity_rule_id` STRING COMMENT '活动规则ID', `activity_id` STRING COMMENT '活动ID', `activity_name` STRING COMMENT '活动名称', `activity_type` STRING COMMENT '活动类型', `start_time` STRING COMMENT '开始时间', `end_time` STRING COMMENT '结束时间', `create_time` STRING COMMENT '创建时间', `condition_amount` DECIMAL(16,2) COMMENT '满减金额', `condition_num` BIGINT COMMENT '满减件数', `benefit_amount` DECIMAL(16,2) COMMENT '优惠金额', `benefit_discount` DECIMAL(16,2) COMMENT '优惠折扣', `benefit_level` STRING COMMENT '优惠级别' ) COMMENT '活动信息表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dim/dim_activity_rule_info/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19分区规划
数据装载
首日加载
insert overwrite table dim_activity_rule_info partition(dt='2020-06-14') select ar.id, ar.activity_id, ai.activity_name, ar.activity_type, ai.start_time, ai.end_time, ai.create_time, ar.condition_amount, ar.condition_num, ar.benefit_amount, ar.benefit_discount, ar.benefit_level from ( select id, activity_id, activity_type, condition_amount, condition_num, benefit_amount, benefit_discount, benefit_level from ods_activity_rule where dt='2020-06-14' )ar left join ( select id, activity_name, start_time, end_time, create_time from ods_activity_info where dt='2020-06-14' )ai on ar.activity_id=ai.id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40每日加载
insert overwrite table dim_activity_rule_info partition(dt='2020-06-15') select ar.id, ar.activity_id, ai.activity_name, ar.activity_type, ai.start_time, ai.end_time, ai.create_time, ar.condition_amount, ar.condition_num, ar.benefit_amount, ar.benefit_discount, ar.benefit_level from ( select id, activity_id, activity_type, condition_amount, condition_num, benefit_amount, benefit_discount, benefit_level from ods_activity_rule where dt='2020-06-15' )ar left join ( select id, activity_name, start_time, end_time, create_time from ods_activity_info where dt='2020-06-15' )ai on ar.activity_id=ai.id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# 地区维度表(特殊)
建表语句
DROP TABLE IF EXISTS dim_base_province; CREATE EXTERNAL TABLE dim_base_province ( `id` STRING COMMENT 'id', `province_name` STRING COMMENT '省市名称', `area_code` STRING COMMENT '地区编码', `iso_code` STRING COMMENT 'ISO-3166编码,供可视化使用', `iso_3166_2` STRING COMMENT 'IOS-3166-2编码,供可视化使用', `region_id` STRING COMMENT '地区id', `region_name` STRING COMMENT '地区名称' ) COMMENT '地区维度表' STORED AS PARQUET LOCATION '/warehouse/gmall/dim/dim_base_province/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13数据装载
地区维度表数据相对稳定,变化概率较低,故无需每日装载
insert overwrite table dim_base_province select bp.id, bp.name, bp.area_code, bp.iso_code, bp.iso_3166_2, bp.region_id, br.region_name from ods_base_province bp join ods_base_region br on bp.region_id = br.id;
1
2
3
4
5
6
7
8
9
10
11
# 时间维度表(特殊)
建表语句
DROP TABLE IF EXISTS dim_date_info; CREATE EXTERNAL TABLE dim_date_info( `date_id` STRING COMMENT '日', `week_id` STRING COMMENT '周ID', `week_day` STRING COMMENT '周几', `day` STRING COMMENT '每月的第几天', `month` STRING COMMENT '第几月', `quarter` STRING COMMENT '第几季度', `year` STRING COMMENT '年', `is_workday` STRING COMMENT '是否是工作日', `holiday_id` STRING COMMENT '节假日' ) COMMENT '时间维度表' STORED AS PARQUET LOCATION '/warehouse/gmall/dim/dim_date_info/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15数据装载
通常情况下,时间维度表的数据并不是来自于业务系统,而是手动写入,并且由于时间维度表数据的可预见性,无须每日导入,一般可一次性导入一年的数据。
创建临时表
DROP TABLE IF EXISTS tmp_dim_date_info; CREATE EXTERNAL TABLE tmp_dim_date_info ( `date_id` STRING COMMENT '日', `week_id` STRING COMMENT '周ID', `week_day` STRING COMMENT '周几', `day` STRING COMMENT '每月的第几天', `month` STRING COMMENT '第几月', `quarter` STRING COMMENT '第几季度', `year` STRING COMMENT '年', `is_workday` STRING COMMENT '是否是工作日', `holiday_id` STRING COMMENT '节假日' ) COMMENT '时间维度表' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/tmp/tmp_dim_date_info/';
1
2
3
4
5
6
7
8
9
10
11
12
13
14将数据文件上传到HFDS上临时表指定路径/warehouse/gmall/tmp/tmp_dim_date_info/
执行以下语句将其导入时间维度表
insert overwrite table dim_date_info select * from tmp_dim_date_info;
1检查数据是否导入成功
# 用户维度表(拉链表)
# 拉链表概述
为什么要做拉链表
如何使用拉链表
拉链表形成过程
# 制作拉链表
建表语句
DROP TABLE IF EXISTS dim_user_info; CREATE EXTERNAL TABLE dim_user_info( `id` STRING COMMENT '用户id', `login_name` STRING COMMENT '用户名称', `nick_name` STRING COMMENT '用户昵称', `name` STRING COMMENT '用户姓名', `phone_num` STRING COMMENT '手机号码', `email` STRING COMMENT '邮箱', `user_level` STRING COMMENT '用户等级', `birthday` STRING COMMENT '生日', `gender` STRING COMMENT '性别', `create_time` STRING COMMENT '创建时间', `operate_time` STRING COMMENT '操作时间', `start_date` STRING COMMENT '开始日期', `end_date` STRING COMMENT '结束日期' ) COMMENT '用户表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dim/dim_user_info/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20分区规划
数据中装载
首日装载
拉链表首日装载,需要进行初始化操作,具体工作为将截止到初始化当日的全部历史用户导入一次性导入到拉链表中。目前的ods_user_info表的第一个分区,即2020-06-14分区中就是全部的历史用户,故将该分区数据进行一定处理后导入拉链表的9999-99-99分区即可。
insert overwrite table dim_user_info partition(dt='9999-99-99') select id, login_name, nick_name, md5(name), md5(phone_num), md5(email), user_level, birthday, gender, create_time, operate_time, '2020-06-14', '9999-99-99' from ods_user_info where dt='2020-06-14';
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17每日装载
sql编写
with tmp as ( select old.id old_id, old.login_name old_login_name, old.nick_name old_nick_name, old.name old_name, old.phone_num old_phone_num, old.email old_email, old.user_level old_user_level, old.birthday old_birthday, old.gender old_gender, old.create_time old_create_time, old.operate_time old_operate_time, old.start_date old_start_date, old.end_date old_end_date, new.id new_id, new.login_name new_login_name, new.nick_name new_nick_name, new.name new_name, new.phone_num new_phone_num, new.email new_email, new.user_level new_user_level, new.birthday new_birthday, new.gender new_gender, new.create_time new_create_time, new.operate_time new_operate_time, new.start_date new_start_date, new.end_date new_end_date from ( select id, login_name, nick_name, name, phone_num, email, user_level, birthday, gender, create_time, operate_time, start_date, end_date from dim_user_info where dt='9999-99-99' )old full outer join ( select id, login_name, nick_name, md5(name) name, md5(phone_num) phone_num, md5(email) email, user_level, birthday, gender, create_time, operate_time, '2020-06-15' start_date, '9999-99-99' end_date from ods_user_info where dt='2020-06-15' )new on old.id=new.id ) insert overwrite table dim_user_info partition(dt) select nvl(new_id,old_id), nvl(new_login_name,old_login_name), nvl(new_nick_name,old_nick_name), nvl(new_name,old_name), nvl(new_phone_num,old_phone_num), nvl(new_email,old_email), nvl(new_user_level,old_user_level), nvl(new_birthday,old_birthday), nvl(new_gender,old_gender), nvl(new_create_time,old_create_time), nvl(new_operate_time,old_operate_time), nvl(new_start_date,old_start_date), nvl(new_end_date,old_end_date), nvl(new_end_date,old_end_date) dt from tmp union all select old_id, old_login_name, old_nick_name, old_name, old_phone_num, old_email, old_user_level, old_birthday, old_gender, old_create_time, old_operate_time, old_start_date, cast(date_add('2020-06-15',-1) as string), cast(date_add('2020-06-15',-1) as string) dt from tmp where new_id is not null and old_id is not null;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
# DIM层首日数据装载脚本
编写脚本
在/home/damoncai/bin目录下创建脚本ods_to_dim_db_init.sh
vim ods_to_dim_db_init.sh
1#!/bin/bash APP=gmall if [ -n "$2" ] ;then do_date=$2 else echo "请传入日期参数" exit fi dim_user_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dim_user_info partition(dt='9999-99-99') select id, login_name, nick_name, md5(name), md5(phone_num), md5(email), user_level, birthday, gender, create_time, operate_time, '$do_date', '9999-99-99' from ${APP}.ods_user_info where dt='$do_date'; " dim_sku_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; with sku as ( select id, price, sku_name, sku_desc, weight, is_sale, spu_id, category3_id, tm_id, create_time from ${APP}.ods_sku_info where dt='$do_date' ), spu as ( select id, spu_name from ${APP}.ods_spu_info where dt='$do_date' ), c3 as ( select id, name, category2_id from ${APP}.ods_base_category3 where dt='$do_date' ), c2 as ( select id, name, category1_id from ${APP}.ods_base_category2 where dt='$do_date' ), c1 as ( select id, name from ${APP}.ods_base_category1 where dt='$do_date' ), tm as ( select id, tm_name from ${APP}.ods_base_trademark where dt='$do_date' ), attr as ( select sku_id, collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs from ${APP}.ods_sku_attr_value where dt='$do_date' group by sku_id ), sale_attr as ( select sku_id, collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs from ${APP}.ods_sku_sale_attr_value where dt='$do_date' group by sku_id ) insert overwrite table ${APP}.dim_sku_info partition(dt='$do_date') select sku.id, sku.price, sku.sku_name, sku.sku_desc, sku.weight, sku.is_sale, sku.spu_id, spu.spu_name, sku.category3_id, c3.name, c3.category2_id, c2.name, c2.category1_id, c1.name, sku.tm_id, tm.tm_name, attr.attrs, sale_attr.sale_attrs, sku.create_time from sku left join spu on sku.spu_id=spu.id left join c3 on sku.category3_id=c3.id left join c2 on c3.category2_id=c2.id left join c1 on c2.category1_id=c1.id left join tm on sku.tm_id=tm.id left join attr on sku.id=attr.sku_id left join sale_attr on sku.id=sale_attr.sku_id; " dim_base_province=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dim_base_province select bp.id, bp.name, bp.area_code, bp.iso_code, bp.iso_3166_2, bp.region_id, br.region_name from ${APP}.ods_base_province bp join ${APP}.ods_base_region br on bp.region_id = br.id; " dim_coupon_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dim_coupon_info partition(dt='$do_date') select id, coupon_name, coupon_type, condition_amount, condition_num, activity_id, benefit_amount, benefit_discount, create_time, range_type, limit_num, taken_count, start_time, end_time, operate_time, expire_time from ${APP}.ods_coupon_info where dt='$do_date'; " dim_activity_rule_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dim_activity_rule_info partition(dt='$do_date') select ar.id, ar.activity_id, ai.activity_name, ar.activity_type, ai.start_time, ai.end_time, ai.create_time, ar.condition_amount, ar.condition_num, ar.benefit_amount, ar.benefit_discount, ar.benefit_level from ( select id, activity_id, activity_type, condition_amount, condition_num, benefit_amount, benefit_discount, benefit_level from ${APP}.ods_activity_rule where dt='$do_date' )ar left join ( select id, activity_name, start_time, end_time, create_time from ${APP}.ods_activity_info where dt='$do_date' )ai on ar.activity_id=ai.id; " case $1 in "dim_user_info"){ hive -e "$dim_user_info" };; "dim_sku_info"){ hive -e "$dim_sku_info" };; "dim_base_province"){ hive -e "$dim_base_province" };; "dim_coupon_info"){ hive -e "$dim_coupon_info" };; "dim_activity_rule_info"){ hive -e "$dim_activity_rule_info" };; "all"){ hive -e "$dim_user_info$dim_sku_info$dim_coupon_info$dim_activity_rule_info$dim_base_province" };; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246添加执行权限
使用脚本
ods_to_dim_db_init.sh all 2020-06-14
1
# DIM层每日数据装载脚本
脚本编写
在/home/damoncai/bin目录下创建脚本ods_to_dim_db.sh
vim ods_to_dim_db.sh
1#!/bin/bash APP=gmall # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$2" ] ;then do_date=$2 else do_date=`date -d "-1 day" +%F` fi dim_user_info=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; with tmp as ( select old.id old_id, old.login_name old_login_name, old.nick_name old_nick_name, old.name old_name, old.phone_num old_phone_num, old.email old_email, old.user_level old_user_level, old.birthday old_birthday, old.gender old_gender, old.create_time old_create_time, old.operate_time old_operate_time, old.start_date old_start_date, old.end_date old_end_date, new.id new_id, new.login_name new_login_name, new.nick_name new_nick_name, new.name new_name, new.phone_num new_phone_num, new.email new_email, new.user_level new_user_level, new.birthday new_birthday, new.gender new_gender, new.create_time new_create_time, new.operate_time new_operate_time, new.start_date new_start_date, new.end_date new_end_date from ( select id, login_name, nick_name, name, phone_num, email, user_level, birthday, gender, create_time, operate_time, start_date, end_date from ${APP}.dim_user_info where dt='9999-99-99' and start_date<'$do_date' )old full outer join ( select id, login_name, nick_name, md5(name) name, md5(phone_num) phone_num, md5(email) email, user_level, birthday, gender, create_time, operate_time, '$do_date' start_date, '9999-99-99' end_date from ${APP}.ods_user_info where dt='$do_date' )new on old.id=new.id ) insert overwrite table ${APP}.dim_user_info partition(dt) select nvl(new_id,old_id), nvl(new_login_name,old_login_name), nvl(new_nick_name,old_nick_name), nvl(new_name,old_name), nvl(new_phone_num,old_phone_num), nvl(new_email,old_email), nvl(new_user_level,old_user_level), nvl(new_birthday,old_birthday), nvl(new_gender,old_gender), nvl(new_create_time,old_create_time), nvl(new_operate_time,old_operate_time), nvl(new_start_date,old_start_date), nvl(new_end_date,old_end_date), nvl(new_end_date,old_end_date) dt from tmp union all select old_id, old_login_name, old_nick_name, old_name, old_phone_num, old_email, old_user_level, old_birthday, old_gender, old_create_time, old_operate_time, old_start_date, cast(date_add('$do_date',-1) as string), cast(date_add('$do_date',-1) as string) dt from tmp where new_id is not null and old_id is not null; " dim_sku_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; with sku as ( select id, price, sku_name, sku_desc, weight, is_sale, spu_id, category3_id, tm_id, create_time from ${APP}.ods_sku_info where dt='$do_date' ), spu as ( select id, spu_name from ${APP}.ods_spu_info where dt='$do_date' ), c3 as ( select id, name, category2_id from ${APP}.ods_base_category3 where dt='$do_date' ), c2 as ( select id, name, category1_id from ${APP}.ods_base_category2 where dt='$do_date' ), c1 as ( select id, name from ${APP}.ods_base_category1 where dt='$do_date' ), tm as ( select id, tm_name from ${APP}.ods_base_trademark where dt='$do_date' ), attr as ( select sku_id, collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs from ${APP}.ods_sku_attr_value where dt='$do_date' group by sku_id ), sale_attr as ( select sku_id, collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs from ${APP}.ods_sku_sale_attr_value where dt='$do_date' group by sku_id ) insert overwrite table ${APP}.dim_sku_info partition(dt='$do_date') select sku.id, sku.price, sku.sku_name, sku.sku_desc, sku.weight, sku.is_sale, sku.spu_id, spu.spu_name, sku.category3_id, c3.name, c3.category2_id, c2.name, c2.category1_id, c1.name, sku.tm_id, tm.tm_name, attr.attrs, sale_attr.sale_attrs, sku.create_time from sku left join spu on sku.spu_id=spu.id left join c3 on sku.category3_id=c3.id left join c2 on c3.category2_id=c2.id left join c1 on c2.category1_id=c1.id left join tm on sku.tm_id=tm.id left join attr on sku.id=attr.sku_id left join sale_attr on sku.id=sale_attr.sku_id; " dim_base_province=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dim_base_province select bp.id, bp.name, bp.area_code, bp.iso_code, bp.iso_3166_2, bp.region_id, bp.name from ${APP}.ods_base_province bp join ${APP}.ods_base_region br on bp.region_id = br.id; " dim_coupon_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dim_coupon_info partition(dt='$do_date') select id, coupon_name, coupon_type, condition_amount, condition_num, activity_id, benefit_amount, benefit_discount, create_time, range_type, limit_num, taken_count, start_time, end_time, operate_time, expire_time from ${APP}.ods_coupon_info where dt='$do_date'; " dim_activity_rule_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dim_activity_rule_info partition(dt='$do_date') select ar.id, ar.activity_id, ai.activity_name, ar.activity_type, ai.start_time, ai.end_time, ai.create_time, ar.condition_amount, ar.condition_num, ar.benefit_amount, ar.benefit_discount, ar.benefit_level from ( select id, activity_id, activity_type, condition_amount, condition_num, benefit_amount, benefit_discount, benefit_level from ${APP}.ods_activity_rule where dt='$do_date' )ar left join ( select id, activity_name, start_time, end_time, create_time from ${APP}.ods_activity_info where dt='$do_date' )ai on ar.activity_id=ai.id; " case $1 in "dim_user_info"){ hive -e "$dim_user_info" };; "dim_sku_info"){ hive -e "$dim_sku_info" };; "dim_base_province"){ hive -e "$dim_base_province" };; "dim_coupon_info"){ hive -e "$dim_coupon_info" };; "dim_activity_rule_info"){ hive -e "$dim_activity_rule_info" };; "all"){ hive -e "$dim_user_info$dim_sku_info$dim_coupon_info$dim_activity_rule_info" };; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336添加执行权限
执行脚本
ods_to_dim_db.sh all 2020-06-14
1
# 数仓搭建-DWD层
# DWD层(用户行为日志)
# 日志解析思路
日志结构回顾
- 页面埋点日志
启动日志
日志解析思路
# get_json_object函数使用
数据
[{"name":"大郎","sex":"男","age":"25"},{"name":"西门庆","sex":"男","age":"47"}]
1取出第一个json对象
hive (gmall)> select get_json_object('[{"name":"大郎","sex":"男","age":"25"},{"name":"西门庆","sex":"男","age":"47"}]','$[0]');
1
2结果是:{"name":"大郎","sex":"男","age":"25"}
取出第一个json的age字段的值
hive (gmall)> SELECT get_json_object('[{"name":"大郎","sex":"男","age":"25"},{"name":"西门庆","sex":"男","age":"47"}]',"$[0].age");
1
2结果是:25
# 启动日志表
**启动日志解析思路:**启动日志表中每行数据对应一个启动记录,一个启动记录应该包含日志中的公共信息和启动信息。先将所有包含start字段的日志过滤出来,然后使用get_json_object函数解析每个字段。
建表语句
DROP TABLE IF EXISTS dwd_start_log; CREATE EXTERNAL TABLE dwd_start_log( `area_code` STRING COMMENT '地区编码', `brand` STRING COMMENT '手机品牌', `channel` STRING COMMENT '渠道', `is_new` STRING COMMENT '是否首次启动', `model` STRING COMMENT '手机型号', `mid_id` STRING COMMENT '设备id', `os` STRING COMMENT '操作系统', `user_id` STRING COMMENT '会员id', `version_code` STRING COMMENT 'app版本号', `entry` STRING COMMENT 'icon手机图标 notice 通知 install 安装后启动', `loading_time` BIGINT COMMENT '启动加载时间', `open_ad_id` STRING COMMENT '广告页ID ', `open_ad_ms` BIGINT COMMENT '广告总共播放时间', `open_ad_skip_ms` BIGINT COMMENT '用户跳过广告时点', `ts` BIGINT COMMENT '时间' ) COMMENT '启动日志表' PARTITIONED BY (`dt` STRING) -- 按照时间创建分区 STORED AS PARQUET -- 采用parquet列式存储 LOCATION '/warehouse/gmall/dwd/dwd_start_log' -- 指定在HDFS上存储位置 TBLPROPERTIES('parquet.compression'='lzo') -- 采用LZO压缩 ;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23数据导入
insert overwrite table dwd_start_log partition(dt='2020-06-14') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.start.entry'), get_json_object(line,'$.start.loading_time'), get_json_object(line,'$.start.open_ad_id'), get_json_object(line,'$.start.open_ad_ms'), get_json_object(line,'$.start.open_ad_skip_ms'), get_json_object(line,'$.ts') from ods_log where dt='2020-06-14' and get_json_object(line,'$.start') is not null;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20查看数据
select * from dwd_start_log where dt='2020-06-14' limit 2;
1
# 页面日志表
**页面日志解析思路:**页面日志表中每行数据对应一个页面访问记录,一个页面访问记录应该包含日志中的公共信息和页面信息。先将所有包含page字段的日志过滤出来,然后使用get_json_object函数解析每个字段。
建表语句
DROP TABLE IF EXISTS dwd_page_log; CREATE EXTERNAL TABLE dwd_page_log( `area_code` STRING COMMENT '地区编码', `brand` STRING COMMENT '手机品牌', `channel` STRING COMMENT '渠道', `is_new` STRING COMMENT '是否首次启动', `model` STRING COMMENT '手机型号', `mid_id` STRING COMMENT '设备id', `os` STRING COMMENT '操作系统', `user_id` STRING COMMENT '会员id', `version_code` STRING COMMENT 'app版本号', `during_time` BIGINT COMMENT '持续时间毫秒', `page_item` STRING COMMENT '目标id ', `page_item_type` STRING COMMENT '目标类型', `last_page_id` STRING COMMENT '上页类型', `page_id` STRING COMMENT '页面ID ', `source_type` STRING COMMENT '来源类型', `ts` bigint ) COMMENT '页面日志表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_page_log' TBLPROPERTIES('parquet.compression'='lzo');
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23导入数据
insert overwrite table dwd_page_log partition(dt='2020-06-14') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.page.during_time'), get_json_object(line,'$.page.item'), get_json_object(line,'$.page.item_type'), get_json_object(line,'$.page.last_page_id'), get_json_object(line,'$.page.page_id'), get_json_object(line,'$.page.source_type'), get_json_object(line,'$.ts') from ods_log where dt='2020-06-14' and get_json_object(line,'$.page') is not null;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21查看数据
select * from dwd_page_log where dt='2020-06-14' limit 2;
1
# 动作日志表
**动作日志解析思路:**动作日志表中每行数据对应用户的一个动作记录,一个动作记录应当包含公共信息、页面信息以及动作信息。先将包含action字段的日志过滤出来,然后通过UDTF函数,将action数组“炸开”(类似于explode函数的效果),然后使用get_json_object函数解析每个字段。
建表语句
DROP TABLE IF EXISTS dwd_action_log; CREATE EXTERNAL TABLE dwd_action_log( `area_code` STRING COMMENT '地区编码', `brand` STRING COMMENT '手机品牌', `channel` STRING COMMENT '渠道', `is_new` STRING COMMENT '是否首次启动', `model` STRING COMMENT '手机型号', `mid_id` STRING COMMENT '设备id', `os` STRING COMMENT '操作系统', `user_id` STRING COMMENT '会员id', `version_code` STRING COMMENT 'app版本号', `during_time` BIGINT COMMENT '持续时间毫秒', `page_item` STRING COMMENT '目标id ', `page_item_type` STRING COMMENT '目标类型', `last_page_id` STRING COMMENT '上页类型', `page_id` STRING COMMENT '页面id ', `source_type` STRING COMMENT '来源类型', `action_id` STRING COMMENT '动作id', `item` STRING COMMENT '目标id ', `item_type` STRING COMMENT '目标类型', `ts` BIGINT COMMENT '时间' ) COMMENT '动作日志表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_action_log' TBLPROPERTIES('parquet.compression'='lzo');
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26创建UDTF函数——设计思路
创建UDTF函数——编写代码
引入如下依赖
<dependencies> <!--添加hive依赖--> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>3.1.2</version> </dependency> </dependencies>
1
2
3
4
5
6
7
8编码
public class ExplodeJSONArray extends GenericUDTF { @Override public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException { // 1 参数合法性检查 if (argOIs.length != 1) { throw new UDFArgumentException("explode_json_array 只需要一个参数"); } // 2 第一个参数必须为string //判断参数是否为基础数据类型 if (argOIs[0].getCategory() != ObjectInspector.Category.PRIMITIVE) { throw new UDFArgumentException("explode_json_array 只接受基础类型参数"); } //将参数对象检查器强转为基础类型对象检查器 PrimitiveObjectInspector argumentOI = (PrimitiveObjectInspector) argOIs[0]; //判断参数是否为String类型 if (argumentOI.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING) { throw new UDFArgumentException("explode_json_array 只接受string类型的参数"); } // 3 定义返回值名称和类型 List<String> fieldNames = new ArrayList<String>(); List<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>(); fieldNames.add("items"); fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector); return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs); } public void process(Object[] objects) throws HiveException { // 1 获取传入的数据 String jsonArray = objects[0].toString(); // 2 将string转换为json数组 JSONArray actions = new JSONArray(jsonArray); // 3 循环一次,取出数组中的一个json,并写出 for (int i = 0; i < actions.length(); i++) { String[] result = new String[1]; result[0] = actions.getString(i); forward(result); } } public void close() throws HiveException { } }
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
创建函数
打包
将hivefunction-1.0-SNAPSHOT.jar上传到hadoop102的/opt/module,然后再将该jar包上传到HDFS的/user/hive/jars路径下
hadoop fs -mkdir -p /user/hive/jars hadoop fs -put hivefunction-1.0-SNAPSHOT.jar /user/hive/jars
1
2创建永久函数与开发好的java class关联
create function explode_json_array as 'top.damoncai.udtf.ExplodeJSONArray' using jar 'hdfs://ha01:8020/user/hive/jars/02_hive_udtf-1.0-SNAPSHOT.jar';
1如果修改了自定义函数重新生成jar包怎么处理?只需要替换HDFS路径上的旧jar包,然后重启Hive客户端即可。
数据导入
insert overwrite table dwd_action_log partition(dt='2020-06-14') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.page.during_time'), get_json_object(line,'$.page.item'), get_json_object(line,'$.page.item_type'), get_json_object(line,'$.page.last_page_id'), get_json_object(line,'$.page.page_id'), get_json_object(line,'$.page.source_type'), get_json_object(action,'$.action_id'), get_json_object(action,'$.item'), get_json_object(action,'$.item_type'), get_json_object(action,'$.ts') from ods_log lateral view explode_json_array(get_json_object(line,'$.actions')) tmp as action where dt='2020-06-14' and get_json_object(line,'$.actions') is not null;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24查看数据
select * from dwd_action_log where dt='2020-06-14' limit 2;
1
# 曝光日志表
**曝光日志解析思路:**曝光日志表中每行数据对应一个曝光记录,一个曝光记录应当包含公共信息、页面信息以及曝光信息。先将包含display字段的日志过滤出来,然后通过UDTF函数,将display数组“炸开”(类似于explode函数的效果),然后使用get_json_object函数解析每个字段。
建表语句
DROP TABLE IF EXISTS dwd_display_log; CREATE EXTERNAL TABLE dwd_display_log( `area_code` STRING COMMENT '地区编码', `brand` STRING COMMENT '手机品牌', `channel` STRING COMMENT '渠道', `is_new` STRING COMMENT '是否首次启动', `model` STRING COMMENT '手机型号', `mid_id` STRING COMMENT '设备id', `os` STRING COMMENT '操作系统', `user_id` STRING COMMENT '会员id', `version_code` STRING COMMENT 'app版本号', `during_time` BIGINT COMMENT 'app版本号', `page_item` STRING COMMENT '目标id ', `page_item_type` STRING COMMENT '目标类型', `last_page_id` STRING COMMENT '上页类型', `page_id` STRING COMMENT '页面ID ', `source_type` STRING COMMENT '来源类型', `ts` BIGINT COMMENT 'app版本号', `display_type` STRING COMMENT '曝光类型', `item` STRING COMMENT '曝光对象id ', `item_type` STRING COMMENT 'app版本号', `order` BIGINT COMMENT '曝光顺序', `pos_id` BIGINT COMMENT '曝光位置' ) COMMENT '曝光日志表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_display_log' TBLPROPERTIES('parquet.compression'='lzo');
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28数据导入
insert overwrite table dwd_display_log partition(dt='2020-06-14') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.page.during_time'), get_json_object(line,'$.page.item'), get_json_object(line,'$.page.item_type'), get_json_object(line,'$.page.last_page_id'), get_json_object(line,'$.page.page_id'), get_json_object(line,'$.page.source_type'), get_json_object(line,'$.ts'), get_json_object(display,'$.display_type'), get_json_object(display,'$.item'), get_json_object(display,'$.item_type'), get_json_object(display,'$.order'), get_json_object(display,'$.pos_id') from ods_log lateral view explode_json_array(get_json_object(line,'$.displays')) tmp as display where dt='2020-06-14' and get_json_object(line,'$.displays') is not null;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26查看数据
select * from dwd_display_log where dt='2020-06-14' limit 2;
1
# 错误日志
**错误日志解析思路:**错误日志表中每行数据对应一个错误记录,为方便定位错误,一个错误记录应当包含与之对应的公共信息、页面信息、曝光信息、动作信息、启动信息以及错误信息。先将包含err字段的日志过滤出来,然后使用get_json_object函数解析所有字段。
建表语句
DROP TABLE IF EXISTS dwd_error_log; CREATE EXTERNAL TABLE dwd_error_log( `area_code` STRING COMMENT '地区编码', `brand` STRING COMMENT '手机品牌', `channel` STRING COMMENT '渠道', `is_new` STRING COMMENT '是否首次启动', `model` STRING COMMENT '手机型号', `mid_id` STRING COMMENT '设备id', `os` STRING COMMENT '操作系统', `user_id` STRING COMMENT '会员id', `version_code` STRING COMMENT 'app版本号', `page_item` STRING COMMENT '目标id ', `page_item_type` STRING COMMENT '目标类型', `last_page_id` STRING COMMENT '上页类型', `page_id` STRING COMMENT '页面ID ', `source_type` STRING COMMENT '来源类型', `entry` STRING COMMENT ' icon手机图标 notice 通知 install 安装后启动', `loading_time` STRING COMMENT '启动加载时间', `open_ad_id` STRING COMMENT '广告页ID ', `open_ad_ms` STRING COMMENT '广告总共播放时间', `open_ad_skip_ms` STRING COMMENT '用户跳过广告时点', `actions` STRING COMMENT '动作', `displays` STRING COMMENT '曝光', `ts` STRING COMMENT '时间', `error_code` STRING COMMENT '错误码', `msg` STRING COMMENT '错误信息' ) COMMENT '错误日志表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_error_log' TBLPROPERTIES('parquet.compression'='lzo');
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31说明:此处为对动作数组和曝光数组做处理,如需分析错误与单个动作或曝光的关联,可先使用explode_json_array
数据导入
insert overwrite table dwd_error_log partition(dt='2020-06-14') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.page.item'), get_json_object(line,'$.page.item_type'), get_json_object(line,'$.page.last_page_id'), get_json_object(line,'$.page.page_id'), get_json_object(line,'$.page.source_type'), get_json_object(line,'$.start.entry'), get_json_object(line,'$.start.loading_time'), get_json_object(line,'$.start.open_ad_id'), get_json_object(line,'$.start.open_ad_ms'), get_json_object(line,'$.start.open_ad_skip_ms'), get_json_object(line,'$.actions'), get_json_object(line,'$.displays'), get_json_object(line,'$.ts'), get_json_object(line,'$.err.error_code'), get_json_object(line,'$.err.msg') from ods_log where dt='2020-06-14' and get_json_object(line,'$.err') is not null;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29查看数据
select * from dwd_error_log where dt='2020-06-14' limit 2;
1
# DWD层用户行为数据加载脚本
脚本编写
在ha01的/home/damoncai/bin目录下创建脚本
#!/bin/bash APP=gmall # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$2" ] ;then do_date=$2 else do_date=`date -d "-1 day" +%F` fi dwd_start_log=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_start_log partition(dt='$do_date') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.start.entry'), get_json_object(line,'$.start.loading_time'), get_json_object(line,'$.start.open_ad_id'), get_json_object(line,'$.start.open_ad_ms'), get_json_object(line,'$.start.open_ad_skip_ms'), get_json_object(line,'$.ts') from ${APP}.ods_log where dt='$do_date' and get_json_object(line,'$.start') is not null;" dwd_page_log=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_page_log partition(dt='$do_date') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.page.during_time'), get_json_object(line,'$.page.item'), get_json_object(line,'$.page.item_type'), get_json_object(line,'$.page.last_page_id'), get_json_object(line,'$.page.page_id'), get_json_object(line,'$.page.source_type'), get_json_object(line,'$.ts') from ${APP}.ods_log where dt='$do_date' and get_json_object(line,'$.page') is not null;" dwd_action_log=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_action_log partition(dt='$do_date') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.page.during_time'), get_json_object(line,'$.page.item'), get_json_object(line,'$.page.item_type'), get_json_object(line,'$.page.last_page_id'), get_json_object(line,'$.page.page_id'), get_json_object(line,'$.page.source_type'), get_json_object(action,'$.action_id'), get_json_object(action,'$.item'), get_json_object(action,'$.item_type'), get_json_object(action,'$.ts') from ${APP}.ods_log lateral view ${APP}.explode_json_array(get_json_object(line,'$.actions')) tmp as action where dt='$do_date' and get_json_object(line,'$.actions') is not null;" dwd_display_log=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_display_log partition(dt='$do_date') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.page.during_time'), get_json_object(line,'$.page.item'), get_json_object(line,'$.page.item_type'), get_json_object(line,'$.page.last_page_id'), get_json_object(line,'$.page.page_id'), get_json_object(line,'$.page.source_type'), get_json_object(line,'$.ts'), get_json_object(display,'$.display_type'), get_json_object(display,'$.item'), get_json_object(display,'$.item_type'), get_json_object(display,'$.order'), get_json_object(display,'$.pos_id') from ${APP}.ods_log lateral view ${APP}.explode_json_array(get_json_object(line,'$.displays')) tmp as display where dt='$do_date' and get_json_object(line,'$.displays') is not null;" dwd_error_log=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_error_log partition(dt='$do_date') select get_json_object(line,'$.common.ar'), get_json_object(line,'$.common.ba'), get_json_object(line,'$.common.ch'), get_json_object(line,'$.common.is_new'), get_json_object(line,'$.common.md'), get_json_object(line,'$.common.mid'), get_json_object(line,'$.common.os'), get_json_object(line,'$.common.uid'), get_json_object(line,'$.common.vc'), get_json_object(line,'$.page.item'), get_json_object(line,'$.page.item_type'), get_json_object(line,'$.page.last_page_id'), get_json_object(line,'$.page.page_id'), get_json_object(line,'$.page.source_type'), get_json_object(line,'$.start.entry'), get_json_object(line,'$.start.loading_time'), get_json_object(line,'$.start.open_ad_id'), get_json_object(line,'$.start.open_ad_ms'), get_json_object(line,'$.start.open_ad_skip_ms'), get_json_object(line,'$.actions'), get_json_object(line,'$.displays'), get_json_object(line,'$.ts'), get_json_object(line,'$.err.error_code'), get_json_object(line,'$.err.msg') from ${APP}.ods_log where dt='$do_date' and get_json_object(line,'$.err') is not null;" case $1 in dwd_start_log ) hive -e "$dwd_start_log" ;; dwd_page_log ) hive -e "$dwd_page_log" ;; dwd_action_log ) hive -e "$dwd_action_log" ;; dwd_display_log ) hive -e "$dwd_display_log" ;; dwd_error_log ) hive -e "$dwd_error_log" ;; all ) hive -e "$dwd_start_log$dwd_page_log$dwd_action_log$dwd_display_log$dwd_error_log" ;; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168添加权限
执行脚本
ods_to_dwd_log.sh all 2020-06-14
1
# DWD层(业务数据)
业务数据方面DWD层的搭建主要注意点在于维度建模
# 评价事实表(事务型事实表)
建表语句
DROP TABLE IF EXISTS dwd_comment_info; CREATE EXTERNAL TABLE dwd_comment_info( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户ID', `sku_id` STRING COMMENT '商品sku', `spu_id` STRING COMMENT '商品spu', `order_id` STRING COMMENT '订单ID', `appraise` STRING COMMENT '评价(好评、中评、差评、默认评价)', `create_time` STRING COMMENT '评价时间' ) COMMENT '评价事实表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_comment_info/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14分区规划
数据装载
首日装载
insert overwrite table dwd_comment_info partition (dt) select id, user_id, sku_id, spu_id, order_id, appraise, create_time, date_format(create_time,'yyyy-MM-dd') from ods_comment_info where dt='2020-06-14';
1
2
3
4
5
6
7
8
9
10
11
12每日装载
insert overwrite table dwd_comment_info partition(dt='2020-06-15') select id, user_id, sku_id, spu_id, order_id, appraise, create_time from ods_comment_info where dt='2020-06-15';
1
2
3
4
5
6
7
8
9
10
# 订单明细事实表(事务型事实表)
建表语句
DROP TABLE IF EXISTS dwd_order_detail; CREATE EXTERNAL TABLE dwd_order_detail ( `id` STRING COMMENT '订单编号', `order_id` STRING COMMENT '订单号', `user_id` STRING COMMENT '用户id', `sku_id` STRING COMMENT 'sku商品id', `province_id` STRING COMMENT '省份ID', `activity_id` STRING COMMENT '活动ID', `activity_rule_id` STRING COMMENT '活动规则ID', `coupon_id` STRING COMMENT '优惠券ID', `create_time` STRING COMMENT '创建时间', `source_type` STRING COMMENT '来源类型', `source_id` STRING COMMENT '来源编号', `sku_num` BIGINT COMMENT '商品数量', `original_amount` DECIMAL(16,2) COMMENT '原始价格', `split_activity_amount` DECIMAL(16,2) COMMENT '活动优惠分摊', `split_coupon_amount` DECIMAL(16,2) COMMENT '优惠券优惠分摊', `split_final_amount` DECIMAL(16,2) COMMENT '最终价格分摊' ) COMMENT '订单明细事实表表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_order_detail/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23分区规划
数据装载
首日装载
insert overwrite table dwd_order_detail partition(dt) select od.id, od.order_id, oi.user_id, od.sku_id, oi.province_id, oda.activity_id, oda.activity_rule_id, odc.coupon_id, od.create_time, od.source_type, od.source_id, od.sku_num, od.order_price*od.sku_num, od.split_activity_amount, od.split_coupon_amount, od.split_final_amount, date_format(create_time,'yyyy-MM-dd') from ( select * from ods_order_detail where dt='2020-06-14' )od left join ( select id, user_id, province_id from ods_order_info where dt='2020-06-14' )oi on od.order_id=oi.id left join ( select order_detail_id, activity_id, activity_rule_id from ods_order_detail_activity where dt='2020-06-14' )oda on od.id=oda.order_detail_id left join ( select order_detail_id, coupon_id from ods_order_detail_coupon where dt='2020-06-14' )odc on od.id=odc.order_detail_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55每日装载
insert overwrite table dwd_order_detail partition(dt='2020-06-15') select od.id, od.order_id, oi.user_id, od.sku_id, oi.province_id, oda.activity_id, oda.activity_rule_id, odc.coupon_id, od.create_time, od.source_type, od.source_id, od.sku_num, od.order_price*od.sku_num, od.split_activity_amount, od.split_coupon_amount, od.split_final_amount from ( select * from ods_order_detail where dt='2020-06-15' )od left join ( select id, user_id, province_id from ods_order_info where dt='2020-06-15' )oi on od.order_id=oi.id left join ( select order_detail_id, activity_id, activity_rule_id from ods_order_detail_activity where dt='2020-06-15' )oda on od.id=oda.order_detail_id left join ( select order_detail_id, coupon_id from ods_order_detail_coupon where dt='2020-06-15' )odc on od.id=odc.order_detail_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# 退单事实表(事务型事实表)
建表语句
DROP TABLE IF EXISTS dwd_order_refund_info; CREATE EXTERNAL TABLE dwd_order_refund_info( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户ID', `order_id` STRING COMMENT '订单ID', `sku_id` STRING COMMENT '商品ID', `province_id` STRING COMMENT '地区ID', `refund_type` STRING COMMENT '退单类型', `refund_num` BIGINT COMMENT '退单件数', `refund_amount` DECIMAL(16,2) COMMENT '退单金额', `refund_reason_type` STRING COMMENT '退单原因类型', `create_time` STRING COMMENT '退单时间' ) COMMENT '退单事实表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_order_refund_info/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17分区规划
数据装载
首日装载
insert overwrite table dwd_order_refund_info partition(dt) select ri.id, ri.user_id, ri.order_id, ri.sku_id, oi.province_id, ri.refund_type, ri.refund_num, ri.refund_amount, ri.refund_reason_type, ri.create_time, date_format(ri.create_time,'yyyy-MM-dd') from ( select * from ods_order_refund_info where dt='2020-06-14' )ri left join ( select id,province_id from ods_order_info where dt='2020-06-14' )oi on ri.order_id=oi.id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22每日装载
insert overwrite table dwd_order_refund_info partition(dt='2020-06-15') select ri.id, ri.user_id, ri.order_id, ri.sku_id, oi.province_id, ri.refund_type, ri.refund_num, ri.refund_amount, ri.refund_reason_type, ri.create_time from ( select * from ods_order_refund_info where dt='2020-06-15' )ri left join ( select id,province_id from ods_order_info where dt='2020-06-15' )oi on ri.order_id=oi.id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
# 加购事实表(周期型快照事实表,每日快照)
建表语句
DROP TABLE IF EXISTS dwd_cart_info; CREATE EXTERNAL TABLE dwd_cart_info( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户ID', `sku_id` STRING COMMENT '商品ID', `source_type` STRING COMMENT '来源类型', `source_id` STRING COMMENT '来源编号', `cart_price` DECIMAL(16,2) COMMENT '加入购物车时的价格', `is_ordered` STRING COMMENT '是否已下单', `create_time` STRING COMMENT '创建时间', `operate_time` STRING COMMENT '修改时间', `order_time` STRING COMMENT '下单时间', `sku_num` BIGINT COMMENT '加购数量' ) COMMENT '加购事实表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_cart_info/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18分区规划
数据装载
首日装载
insert overwrite table dwd_cart_info partition(dt='2020-06-14') select id, user_id, sku_id, source_type, source_id, cart_price, is_ordered, create_time, operate_time, order_time, sku_num from ods_cart_info where dt='2020-06-14';
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15每日装载
insert overwrite table dwd_cart_info partition(dt='2020-06-15') select id, user_id, sku_id, source_type, source_id, cart_price, is_ordered, create_time, operate_time, order_time, sku_num from ods_cart_info where dt='2020-06-15';
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 收藏事实表(周期型快照事实表,每日快照)
建表语句
DROP TABLE IF EXISTS dwd_favor_info; CREATE EXTERNAL TABLE dwd_favor_info( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户id', `sku_id` STRING COMMENT 'skuid', `spu_id` STRING COMMENT 'spuid', `is_cancel` STRING COMMENT '是否取消', `create_time` STRING COMMENT '收藏时间', `cancel_time` STRING COMMENT '取消时间' ) COMMENT '收藏事实表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_favor_info/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14分区规划
数据装载
首日装载
insert overwrite table dwd_favor_info partition(dt='2020-06-14') select id, user_id, sku_id, spu_id, is_cancel, create_time, cancel_time from ods_favor_info where dt='2020-06-14';
1
2
3
4
5
6
7
8
9
10
11每日装载
insert overwrite table dwd_favor_info partition(dt='2020-06-15') select id, user_id, sku_id, spu_id, is_cancel, create_time, cancel_time from ods_favor_info where dt='2020-06-15';
1
2
3
4
5
6
7
8
9
10
11
# 优惠券领用事实表(累积型快照事实表)
建表语句
DROP TABLE IF EXISTS dwd_coupon_use; CREATE EXTERNAL TABLE dwd_coupon_use( `id` STRING COMMENT '编号', `coupon_id` STRING COMMENT '优惠券ID', `user_id` STRING COMMENT 'userid', `order_id` STRING COMMENT '订单id', `coupon_status` STRING COMMENT '优惠券状态', `get_time` STRING COMMENT '领取时间', `using_time` STRING COMMENT '使用时间(下单)', `used_time` STRING COMMENT '使用时间(支付)', `expire_time` STRING COMMENT '过期时间' ) COMMENT '优惠券领用事实表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_coupon_use/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16分区规划
数据装载
首日装载
insert overwrite table dwd_coupon_use partition(dt) select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time, expire_time, coalesce(date_format(used_time,'yyyy-MM-dd'),date_format(expire_time,'yyyy-MM-dd'),'9999-99-99') from ods_coupon_use where dt='2020-06-14';
1
2
3
4
5
6
7
8
9
10
11
12
13
14每日装载
装载逻辑
转载语句
insert overwrite table dwd_coupon_use partition(dt) select nvl(new.id,old.id), nvl(new.coupon_id,old.coupon_id), nvl(new.user_id,old.user_id), nvl(new.order_id,old.order_id), nvl(new.coupon_status,old.coupon_status), nvl(new.get_time,old.get_time), nvl(new.using_time,old.using_time), nvl(new.used_time,old.used_time), nvl(new.expire_time,old.expire_time), coalesce(date_format(nvl(new.used_time,old.used_time),'yyyy-MM-dd'),date_format(nvl(new.expire_time,old.expire_time),'yyyy-MM-dd'),'9999-99-99') from ( select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time, expire_time from dwd_coupon_use where dt='9999-99-99' )old full outer join ( select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time, expire_time from ods_coupon_use where dt='2020-06-15' )new on old.id=new.id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# 支付事实表(累积型快照事实表)
建表语句
DROP TABLE IF EXISTS dwd_payment_info; CREATE EXTERNAL TABLE dwd_payment_info ( `id` STRING COMMENT '编号', `order_id` STRING COMMENT '订单编号', `user_id` STRING COMMENT '用户编号', `province_id` STRING COMMENT '地区ID', `trade_no` STRING COMMENT '交易编号', `out_trade_no` STRING COMMENT '对外交易编号', `payment_type` STRING COMMENT '支付类型', `payment_amount` DECIMAL(16,2) COMMENT '支付金额', `payment_status` STRING COMMENT '支付状态', `create_time` STRING COMMENT '创建时间',--调用第三方支付接口的时间 `callback_time` STRING COMMENT '完成时间'--支付完成时间,即支付成功回调时间 ) COMMENT '支付事实表表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_payment_info/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18分区规划
数据装载
首日装载
insert overwrite table dwd_payment_info partition(dt) select pi.id, pi.order_id, pi.user_id, oi.province_id, pi.trade_no, pi.out_trade_no, pi.payment_type, pi.payment_amount, pi.payment_status, pi.create_time, pi.callback_time, nvl(date_format(pi.callback_time,'yyyy-MM-dd'),'9999-99-99') from ( select * from ods_payment_info where dt='2020-06-14' )pi left join ( select id,province_id from ods_order_info where dt='2020-06-14' )oi on pi.order_id=oi.id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23每日装载
insert overwrite table dwd_payment_info partition(dt) select nvl(new.id,old.id), nvl(new.order_id,old.order_id), nvl(new.user_id,old.user_id), nvl(new.province_id,old.province_id), nvl(new.trade_no,old.trade_no), nvl(new.out_trade_no,old.out_trade_no), nvl(new.payment_type,old.payment_type), nvl(new.payment_amount,old.payment_amount), nvl(new.payment_status,old.payment_status), nvl(new.create_time,old.create_time), nvl(new.callback_time,old.callback_time), nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99') from ( select id, order_id, user_id, province_id, trade_no, out_trade_no, payment_type, payment_amount, payment_status, create_time, callback_time from dwd_payment_info where dt = '9999-99-99' )old full outer join ( select pi.id, pi.out_trade_no, pi.order_id, pi.user_id, oi.province_id, pi.payment_type, pi.trade_no, pi.payment_amount, pi.payment_status, pi.create_time, pi.callback_time from ( select * from ods_payment_info where dt='2020-06-15' )pi left join ( select id,province_id from ods_order_info where dt='2020-06-15' )oi on pi.order_id=oi.id )new on old.id=new.id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# 退款事实表(累积型快照事实表)
建表语句
DROP TABLE IF EXISTS dwd_refund_payment; CREATE EXTERNAL TABLE dwd_refund_payment ( `id` STRING COMMENT '编号', `user_id` STRING COMMENT '用户ID', `order_id` STRING COMMENT '订单编号', `sku_id` STRING COMMENT 'SKU编号', `province_id` STRING COMMENT '地区ID', `trade_no` STRING COMMENT '交易编号', `out_trade_no` STRING COMMENT '对外交易编号', `payment_type` STRING COMMENT '支付类型', `refund_amount` DECIMAL(16,2) COMMENT '退款金额', `refund_status` STRING COMMENT '退款状态', `create_time` STRING COMMENT '创建时间',--调用第三方支付接口的时间 `callback_time` STRING COMMENT '回调时间'--支付接口回调时间,即支付成功时间 ) COMMENT '退款事实表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_refund_payment/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19分区规划
数据装载
首日装载
insert overwrite table dwd_refund_payment partition(dt) select rp.id, user_id, order_id, sku_id, province_id, trade_no, out_trade_no, payment_type, refund_amount, refund_status, create_time, callback_time, nvl(date_format(callback_time,'yyyy-MM-dd'),'9999-99-99') from ( select id, out_trade_no, order_id, sku_id, payment_type, trade_no, refund_amount, refund_status, create_time, callback_time from ods_refund_payment where dt='2020-06-14' )rp left join ( select id, user_id, province_id from ods_order_info where dt='2020-06-14' )oi on rp.order_id=oi.id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41每日装载
insert overwrite table dwd_refund_payment partition(dt) select nvl(new.id,old.id), nvl(new.user_id,old.user_id), nvl(new.order_id,old.order_id), nvl(new.sku_id,old.sku_id), nvl(new.province_id,old.province_id), nvl(new.trade_no,old.trade_no), nvl(new.out_trade_no,old.out_trade_no), nvl(new.payment_type,old.payment_type), nvl(new.refund_amount,old.refund_amount), nvl(new.refund_status,old.refund_status), nvl(new.create_time,old.create_time), nvl(new.callback_time,old.callback_time), nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99') from ( select id, user_id, order_id, sku_id, province_id, trade_no, out_trade_no, payment_type, refund_amount, refund_status, create_time, callback_time from dwd_refund_payment where dt='9999-99-99' )old full outer join ( select rp.id, user_id, order_id, sku_id, province_id, trade_no, out_trade_no, payment_type, refund_amount, refund_status, create_time, callback_time from ( select id, out_trade_no, order_id, sku_id, payment_type, trade_no, refund_amount, refund_status, create_time, callback_time from ods_refund_payment where dt='2020-06-15' )rp left join ( select id, user_id, province_id from ods_order_info where dt='2020-06-15' )oi on rp.order_id=oi.id )new on old.id=new.id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# 订单事实表(累积型快照事实表)
建表语句
DROP TABLE IF EXISTS dwd_order_info; CREATE EXTERNAL TABLE dwd_order_info( `id` STRING COMMENT '编号', `order_status` STRING COMMENT '订单状态', `user_id` STRING COMMENT '用户ID', `province_id` STRING COMMENT '地区ID', `payment_way` STRING COMMENT '支付方式', `delivery_address` STRING COMMENT '邮寄地址', `out_trade_no` STRING COMMENT '对外交易编号', `tracking_no` STRING COMMENT '物流单号', `create_time` STRING COMMENT '创建时间(未支付状态)', `payment_time` STRING COMMENT '支付时间(已支付状态)', `cancel_time` STRING COMMENT '取消时间(已取消状态)', `finish_time` STRING COMMENT '完成时间(已完成状态)', `refund_time` STRING COMMENT '退款时间(退款中状态)', `refund_finish_time` STRING COMMENT '退款完成时间(退款完成状态)', `expire_time` STRING COMMENT '过期时间', `feight_fee` DECIMAL(16,2) COMMENT '运费', `feight_fee_reduce` DECIMAL(16,2) COMMENT '运费减免', `activity_reduce_amount` DECIMAL(16,2) COMMENT '活动减免', `coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠券减免', `original_amount` DECIMAL(16,2) COMMENT '订单原始价格', `final_amount` DECIMAL(16,2) COMMENT '订单最终价格' ) COMMENT '订单事实表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwd/dwd_order_info/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28分区规划
数据装载
首日装载
insert overwrite table dwd_order_info partition(dt) select oi.id, oi.order_status, oi.user_id, oi.province_id, oi.payment_way, oi.delivery_address, oi.out_trade_no, oi.tracking_no, oi.create_time, times.ts['1002'] payment_time, times.ts['1003'] cancel_time, times.ts['1004'] finish_time, times.ts['1005'] refund_time, times.ts['1006'] refund_finish_time, oi.expire_time, feight_fee, feight_fee_reduce, activity_reduce_amount, coupon_reduce_amount, original_amount, final_amount, case when times.ts['1003'] is not null then date_format(times.ts['1003'],'yyyy-MM-dd') when times.ts['1004'] is not null and date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)<='2020-06-14' and times.ts['1005'] is null then date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7) when times.ts['1006'] is not null then date_format(times.ts['1006'],'yyyy-MM-dd') when oi.expire_time is not null then date_format(oi.expire_time,'yyyy-MM-dd') else '9999-99-99' end from ( select * from ods_order_info where dt='2020-06-14' )oi left join ( select order_id, str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts from ods_order_status_log where dt='2020-06-14' group by order_id )times on oi.id=times.order_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47每日装载
insert overwrite table dwd_order_info partition(dt) select nvl(new.id,old.id), nvl(new.order_status,old.order_status), nvl(new.user_id,old.user_id), nvl(new.province_id,old.province_id), nvl(new.payment_way,old.payment_way), nvl(new.delivery_address,old.delivery_address), nvl(new.out_trade_no,old.out_trade_no), nvl(new.tracking_no,old.tracking_no), nvl(new.create_time,old.create_time), nvl(new.payment_time,old.payment_time), nvl(new.cancel_time,old.cancel_time), nvl(new.finish_time,old.finish_time), nvl(new.refund_time,old.refund_time), nvl(new.refund_finish_time,old.refund_finish_time), nvl(new.expire_time,old.expire_time), nvl(new.feight_fee,old.feight_fee), nvl(new.feight_fee_reduce,old.feight_fee_reduce), nvl(new.activity_reduce_amount,old.activity_reduce_amount), nvl(new.coupon_reduce_amount,old.coupon_reduce_amount), nvl(new.original_amount,old.original_amount), nvl(new.final_amount,old.final_amount), case when new.cancel_time is not null then date_format(new.cancel_time,'yyyy-MM-dd') when new.finish_time is not null and date_add(date_format(new.finish_time,'yyyy-MM-dd'),7)='2020-06-15' and new.refund_time is null then '2020-06-15' when new.refund_finish_time is not null then date_format(new.refund_finish_time,'yyyy-MM-dd') when new.expire_time is not null then date_format(new.expire_time,'yyyy-MM-dd') else '9999-99-99' end from ( select id, order_status, user_id, province_id, payment_way, delivery_address, out_trade_no, tracking_no, create_time, payment_time, cancel_time, finish_time, refund_time, refund_finish_time, expire_time, feight_fee, feight_fee_reduce, activity_reduce_amount, coupon_reduce_amount, original_amount, final_amount from dwd_order_info where dt='9999-99-99' )old full outer join ( select oi.id, oi.order_status, oi.user_id, oi.province_id, oi.payment_way, oi.delivery_address, oi.out_trade_no, oi.tracking_no, oi.create_time, times.ts['1002'] payment_time, times.ts['1003'] cancel_time, times.ts['1004'] finish_time, times.ts['1005'] refund_time, times.ts['1006'] refund_finish_time, oi.expire_time, feight_fee, feight_fee_reduce, activity_reduce_amount, coupon_reduce_amount, original_amount, final_amount from ( select * from ods_order_info where dt='2020-06-15' )oi left join ( select order_id, str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts from ods_order_status_log where dt='2020-06-15' group by order_id )times on oi.id=times.order_id )new on old.id=new.id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
# DWD层业务数据首日装载脚本
在/home/damonca/bin目录下创建脚本ods_to_dwd_db_init.sh
vim ods_to_dwd_db_init.sh
1#!/bin/bash APP=gmall if [ -n "$2" ] ;then do_date=$2 else echo "请传入日期参数" exit fi dwd_order_info=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_order_info partition(dt) select oi.id, oi.order_status, oi.user_id, oi.province_id, oi.payment_way, oi.delivery_address, oi.out_trade_no, oi.tracking_no, oi.create_time, times.ts['1002'] payment_time, times.ts['1003'] cancel_time, times.ts['1004'] finish_time, times.ts['1005'] refund_time, times.ts['1006'] refund_finish_time, oi.expire_time, feight_fee, feight_fee_reduce, activity_reduce_amount, coupon_reduce_amount, original_amount, final_amount, case when times.ts['1003'] is not null then date_format(times.ts['1003'],'yyyy-MM-dd') when times.ts['1004'] is not null and date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)<='$do_date' and times.ts['1005'] is null then date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7) when times.ts['1006'] is not null then date_format(times.ts['1006'],'yyyy-MM-dd') when oi.expire_time is not null then date_format(oi.expire_time,'yyyy-MM-dd') else '9999-99-99' end from ( select * from ${APP}.ods_order_info where dt='$do_date' )oi left join ( select order_id, str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts from ${APP}.ods_order_status_log where dt='$do_date' group by order_id )times on oi.id=times.order_id;" dwd_order_detail=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_order_detail partition(dt) select od.id, od.order_id, oi.user_id, od.sku_id, oi.province_id, oda.activity_id, oda.activity_rule_id, odc.coupon_id, od.create_time, od.source_type, od.source_id, od.sku_num, od.order_price*od.sku_num, od.split_activity_amount, od.split_coupon_amount, od.split_final_amount, date_format(create_time,'yyyy-MM-dd') from ( select * from ${APP}.ods_order_detail where dt='$do_date' )od left join ( select id, user_id, province_id from ${APP}.ods_order_info where dt='$do_date' )oi on od.order_id=oi.id left join ( select order_detail_id, activity_id, activity_rule_id from ${APP}.ods_order_detail_activity where dt='$do_date' )oda on od.id=oda.order_detail_id left join ( select order_detail_id, coupon_id from ${APP}.ods_order_detail_coupon where dt='$do_date' )odc on od.id=odc.order_detail_id;" dwd_payment_info=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_payment_info partition(dt) select pi.id, pi.order_id, pi.user_id, oi.province_id, pi.trade_no, pi.out_trade_no, pi.payment_type, pi.payment_amount, pi.payment_status, pi.create_time, pi.callback_time, nvl(date_format(pi.callback_time,'yyyy-MM-dd'),'9999-99-99') from ( select * from ${APP}.ods_payment_info where dt='$do_date' )pi left join ( select id,province_id from ${APP}.ods_order_info where dt='$do_date' )oi on pi.order_id=oi.id;" dwd_cart_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_cart_info partition(dt='$do_date') select id, user_id, sku_id, source_type, source_id, cart_price, is_ordered, create_time, operate_time, order_time, sku_num from ${APP}.ods_cart_info where dt='$do_date';" dwd_comment_info=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_comment_info partition(dt) select id, user_id, sku_id, spu_id, order_id, appraise, create_time, date_format(create_time,'yyyy-MM-dd') from ${APP}.ods_comment_info where dt='$do_date'; " dwd_favor_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_favor_info partition(dt='$do_date') select id, user_id, sku_id, spu_id, is_cancel, create_time, cancel_time from ${APP}.ods_favor_info where dt='$do_date';" dwd_coupon_use=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_coupon_use partition(dt) select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time, expire_time, coalesce(date_format(used_time,'yyyy-MM-dd'),date_format(expire_time,'yyyy-MM-dd'),'9999-99-99') from ${APP}.ods_coupon_use where dt='$do_date';" dwd_order_refund_info=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_order_refund_info partition(dt) select ri.id, ri.user_id, ri.order_id, ri.sku_id, oi.province_id, ri.refund_type, ri.refund_num, ri.refund_amount, ri.refund_reason_type, ri.create_time, date_format(ri.create_time,'yyyy-MM-dd') from ( select * from ${APP}.ods_order_refund_info where dt='$do_date' )ri left join ( select id,province_id from ${APP}.ods_order_info where dt='$do_date' )oi on ri.order_id=oi.id;" dwd_refund_payment=" set hive.exec.dynamic.partition.mode=nonstrict; set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_refund_payment partition(dt) select rp.id, user_id, order_id, sku_id, province_id, trade_no, out_trade_no, payment_type, refund_amount, refund_status, create_time, callback_time, nvl(date_format(callback_time,'yyyy-MM-dd'),'9999-99-99') from ( select id, out_trade_no, order_id, sku_id, payment_type, trade_no, refund_amount, refund_status, create_time, callback_time from ${APP}.ods_refund_payment where dt='$do_date' )rp left join ( select id, user_id, province_id from ${APP}.ods_order_info where dt='$do_date' )oi on rp.order_id=oi.id;" case $1 in dwd_order_info ) hive -e "$dwd_order_info" ;; dwd_order_detail ) hive -e "$dwd_order_detail" ;; dwd_payment_info ) hive -e "$dwd_payment_info" ;; dwd_cart_info ) hive -e "$dwd_cart_info" ;; dwd_comment_info ) hive -e "$dwd_comment_info" ;; dwd_favor_info ) hive -e "$dwd_favor_info" ;; dwd_coupon_use ) hive -e "$dwd_coupon_use" ;; dwd_order_refund_info ) hive -e "$dwd_order_refund_info" ;; dwd_refund_payment ) hive -e "$dwd_refund_payment" ;; all ) hive -e "$dwd_order_info$dwd_order_detail$dwd_payment_info$dwd_cart_info$dwd_comment_info$dwd_favor_info$dwd_coupon_use$dwd_order_refund_info$dwd_refund_payment" ;; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317添加权限
执行脚本
ods_to_dwd_db_init.sh all 2020-06-14
1
# DWD层业务数据每日装载脚本
在/home/damoncai/bin目录下创建脚本ods_to_dwd_db.sh
vim ods_to_dwd_db.sh
1#!/bin/bash APP=gmall # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$2" ] ;then do_date=$2 else do_date=`date -d "-1 day" +%F` fi # 假设某累积型快照事实表,某天所有的业务记录全部完成,则会导致9999-99-99分区的数据未被覆盖,从而导致数据重复,该函数根据9999-99-99分区的数据的末次修改时间判断其是否被覆盖了,如果未被覆盖,就手动清理 clear_data(){ current_date=`date +%F` current_date_timestamp=`date -d "$current_date" +%s` last_modified_date=`hadoop fs -ls /warehouse/gmall/dwd/$1 | grep '9999-99-99' | awk '{print $6}'` last_modified_date_timestamp=`date -d "$last_modified_date" +%s` if [[ $last_modified_date_timestamp -lt $current_date_timestamp ]]; then echo "clear table $1 partition(dt=9999-99-99)" hadoop fs -rm -r -f /warehouse/gmall/dwd/$1/dt=9999-99-99/* fi } dwd_order_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table ${APP}.dwd_order_info partition(dt) select nvl(new.id,old.id), nvl(new.order_status,old.order_status), nvl(new.user_id,old.user_id), nvl(new.province_id,old.province_id), nvl(new.payment_way,old.payment_way), nvl(new.delivery_address,old.delivery_address), nvl(new.out_trade_no,old.out_trade_no), nvl(new.tracking_no,old.tracking_no), nvl(new.create_time,old.create_time), nvl(new.payment_time,old.payment_time), nvl(new.cancel_time,old.cancel_time), nvl(new.finish_time,old.finish_time), nvl(new.refund_time,old.refund_time), nvl(new.refund_finish_time,old.refund_finish_time), nvl(new.expire_time,old.expire_time), nvl(new.feight_fee,old.feight_fee), nvl(new.feight_fee_reduce,old.feight_fee_reduce), nvl(new.activity_reduce_amount,old.activity_reduce_amount), nvl(new.coupon_reduce_amount,old.coupon_reduce_amount), nvl(new.original_amount,old.original_amount), nvl(new.final_amount,old.final_amount), case when new.cancel_time is not null then date_format(new.cancel_time,'yyyy-MM-dd') when new.finish_time is not null and date_add(date_format(new.finish_time,'yyyy-MM-dd'),7)='$do_date' and new.refund_time is null then '$do_date' when new.refund_finish_time is not null then date_format(new.refund_finish_time,'yyyy-MM-dd') when new.expire_time is not null then date_format(new.expire_time,'yyyy-MM-dd') else '9999-99-99' end from ( select id, order_status, user_id, province_id, payment_way, delivery_address, out_trade_no, tracking_no, create_time, payment_time, cancel_time, finish_time, refund_time, refund_finish_time, expire_time, feight_fee, feight_fee_reduce, activity_reduce_amount, coupon_reduce_amount, original_amount, final_amount from ${APP}.dwd_order_info where dt='9999-99-99' )old full outer join ( select oi.id, oi.order_status, oi.user_id, oi.province_id, oi.payment_way, oi.delivery_address, oi.out_trade_no, oi.tracking_no, oi.create_time, times.ts['1002'] payment_time, times.ts['1003'] cancel_time, times.ts['1004'] finish_time, times.ts['1005'] refund_time, times.ts['1006'] refund_finish_time, oi.expire_time, feight_fee, feight_fee_reduce, activity_reduce_amount, coupon_reduce_amount, original_amount, final_amount from ( select * from ${APP}.ods_order_info where dt='$do_date' )oi left join ( select order_id, str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts from ${APP}.ods_order_status_log where dt='$do_date' group by order_id )times on oi.id=times.order_id )new on old.id=new.id;" dwd_order_detail=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_order_detail partition(dt='$do_date') select od.id, od.order_id, oi.user_id, od.sku_id, oi.province_id, oda.activity_id, oda.activity_rule_id, odc.coupon_id, od.create_time, od.source_type, od.source_id, od.sku_num, od.order_price*od.sku_num, od.split_activity_amount, od.split_coupon_amount, od.split_final_amount from ( select * from ${APP}.ods_order_detail where dt='$do_date' )od left join ( select id, user_id, province_id from ${APP}.ods_order_info where dt='$do_date' )oi on od.order_id=oi.id left join ( select order_detail_id, activity_id, activity_rule_id from ${APP}.ods_order_detail_activity where dt='$do_date' )oda on od.id=oda.order_detail_id left join ( select order_detail_id, coupon_id from ${APP}.ods_order_detail_coupon where dt='$do_date' )odc on od.id=odc.order_detail_id;" dwd_payment_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table ${APP}.dwd_payment_info partition(dt) select nvl(new.id,old.id), nvl(new.order_id,old.order_id), nvl(new.user_id,old.user_id), nvl(new.province_id,old.province_id), nvl(new.trade_no,old.trade_no), nvl(new.out_trade_no,old.out_trade_no), nvl(new.payment_type,old.payment_type), nvl(new.payment_amount,old.payment_amount), nvl(new.payment_status,old.payment_status), nvl(new.create_time,old.create_time), nvl(new.callback_time,old.callback_time), nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99') from ( select id, order_id, user_id, province_id, trade_no, out_trade_no, payment_type, payment_amount, payment_status, create_time, callback_time from ${APP}.dwd_payment_info where dt = '9999-99-99' )old full outer join ( select pi.id, pi.out_trade_no, pi.order_id, pi.user_id, oi.province_id, pi.payment_type, pi.trade_no, pi.payment_amount, pi.payment_status, pi.create_time, pi.callback_time from ( select * from ${APP}.ods_payment_info where dt='$do_date' )pi left join ( select id,province_id from ${APP}.ods_order_info where dt='$do_date' )oi on pi.order_id=oi.id )new on old.id=new.id;" dwd_cart_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_cart_info partition(dt='$do_date') select id, user_id, sku_id, source_type, source_id, cart_price, is_ordered, create_time, operate_time, order_time, sku_num from ${APP}.ods_cart_info where dt='$do_date';" dwd_comment_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_comment_info partition(dt='$do_date') select id, user_id, sku_id, spu_id, order_id, appraise, create_time from ${APP}.ods_comment_info where dt='$do_date';" dwd_favor_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_favor_info partition(dt='$do_date') select id, user_id, sku_id, spu_id, is_cancel, create_time, cancel_time from ${APP}.ods_favor_info where dt='$do_date';" dwd_coupon_use=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table ${APP}.dwd_coupon_use partition(dt) select nvl(new.id,old.id), nvl(new.coupon_id,old.coupon_id), nvl(new.user_id,old.user_id), nvl(new.order_id,old.order_id), nvl(new.coupon_status,old.coupon_status), nvl(new.get_time,old.get_time), nvl(new.using_time,old.using_time), nvl(new.used_time,old.used_time), nvl(new.expire_time,old.expire_time), coalesce(date_format(nvl(new.used_time,old.used_time),'yyyy-MM-dd'),date_format(nvl(new.expire_time,old.expire_time),'yyyy-MM-dd'),'9999-99-99') from ( select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time, expire_time from ${APP}.dwd_coupon_use where dt='9999-99-99' )old full outer join ( select id, coupon_id, user_id, order_id, coupon_status, get_time, using_time, used_time, expire_time from ${APP}.ods_coupon_use where dt='$do_date' )new on old.id=new.id;" dwd_order_refund_info=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; insert overwrite table ${APP}.dwd_order_refund_info partition(dt='$do_date') select ri.id, ri.user_id, ri.order_id, ri.sku_id, oi.province_id, ri.refund_type, ri.refund_num, ri.refund_amount, ri.refund_reason_type, ri.create_time from ( select * from ${APP}.ods_order_refund_info where dt='$do_date' )ri left join ( select id,province_id from ${APP}.ods_order_info where dt='$do_date' )oi on ri.order_id=oi.id;" dwd_refund_payment=" set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat; set hive.exec.dynamic.partition.mode=nonstrict; insert overwrite table ${APP}.dwd_refund_payment partition(dt) select nvl(new.id,old.id), nvl(new.user_id,old.user_id), nvl(new.order_id,old.order_id), nvl(new.sku_id,old.sku_id), nvl(new.province_id,old.province_id), nvl(new.trade_no,old.trade_no), nvl(new.out_trade_no,old.out_trade_no), nvl(new.payment_type,old.payment_type), nvl(new.refund_amount,old.refund_amount), nvl(new.refund_status,old.refund_status), nvl(new.create_time,old.create_time), nvl(new.callback_time,old.callback_time), nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99') from ( select id, user_id, order_id, sku_id, province_id, trade_no, out_trade_no, payment_type, refund_amount, refund_status, create_time, callback_time from ${APP}.dwd_refund_payment where dt='9999-99-99' )old full outer join ( select rp.id, user_id, order_id, sku_id, province_id, trade_no, out_trade_no, payment_type, refund_amount, refund_status, create_time, callback_time from ( select id, out_trade_no, order_id, sku_id, payment_type, trade_no, refund_amount, refund_status, create_time, callback_time from ${APP}.ods_refund_payment where dt='$do_date' )rp left join ( select id, user_id, province_id from ${APP}.ods_order_info where dt='$do_date' )oi on rp.order_id=oi.id )new on old.id=new.id;" case $1 in dwd_order_info ) hive -e "$dwd_order_info" clear_data dwd_order_info ;; dwd_order_detail ) hive -e "$dwd_order_detail" ;; dwd_payment_info ) hive -e "$dwd_payment_info" clear_data dwd_payment_info ;; dwd_cart_info ) hive -e "$dwd_cart_info" ;; dwd_comment_info ) hive -e "$dwd_comment_info" ;; dwd_favor_info ) hive -e "$dwd_favor_info" ;; dwd_coupon_use ) hive -e "$dwd_coupon_use" clear_data dwd_coupon_use ;; dwd_order_refund_info ) hive -e "$dwd_order_refund_info" ;; dwd_refund_payment ) hive -e "$dwd_refund_payment" clear_data dwd_refund_payment ;; all ) hive -e "$dwd_order_info$dwd_order_detail$dwd_payment_info$dwd_cart_info$dwd_comment_info$dwd_favor_info$dwd_coupon_use$dwd_order_refund_info$dwd_refund_payment" clear_data dwd_order_info clear_data dwd_payment_info clear_data dwd_coupon_use clear_data dwd_refund_payment ;; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486添加执行权限
执行脚本
ods_to_dwd_db.sh all 2020-06-14
1
# 数仓搭建-DWS层
# 访客主题
建表语句
DROP TABLE IF EXISTS dws_visitor_action_daycount; CREATE EXTERNAL TABLE dws_visitor_action_daycount ( `mid_id` STRING COMMENT '设备id', `brand` STRING COMMENT '设备品牌', `model` STRING COMMENT '设备型号', `is_new` STRING COMMENT '是否首次访问', `channel` ARRAY<STRING> COMMENT '渠道', `os` ARRAY<STRING> COMMENT '操作系统', `area_code` ARRAY<STRING> COMMENT '地区ID', `version_code` ARRAY<STRING> COMMENT '应用版本', `visit_count` BIGINT COMMENT '访问次数', `page_stats` ARRAY<STRUCT<page_id:STRING,page_count:BIGINT,during_time:BIGINT>> COMMENT '页面访问统计' ) COMMENT '每日设备行为表' PARTITIONED BY(`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dws/dws_visitor_action_daycount' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18数据装载
insert overwrite table dws_visitor_action_daycount partition(dt='2020-06-14') select t1.mid_id, t1.brand, t1.model, t1.is_new, t1.channel, t1.os, t1.area_code, t1.version_code, t1.visit_count, t3.page_stats from ( select mid_id, brand, model, if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天内,同一设备的is_new字段,可能全部为1,可能全部为0,也可能部分为0,部分为1(卸载重装),故做该处理 collect_set(channel) channel, collect_set(os) os, collect_set(area_code) area_code, collect_set(version_code) version_code, sum(if(last_page_id is null,1,0)) visit_count from dwd_page_log where dt='2020-06-14' and last_page_id is null group by mid_id,model,brand )t1 join ( select mid_id, brand, model, collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_stats from ( select mid_id, brand, model, page_id, count(*) page_count, sum(during_time) during_time from dwd_page_log where dt='2020-06-14' group by mid_id,model,brand,page_id )t2 group by mid_id,model,brand )t3 on t1.mid_id=t3.mid_id and t1.brand=t3.brand and t1.model=t3.model;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
# 用户主题
建表语句
DROP TABLE IF EXISTS dws_user_action_daycount; CREATE EXTERNAL TABLE dws_user_action_daycount ( `user_id` STRING COMMENT '用户id', `login_count` BIGINT COMMENT '登录次数', `cart_count` BIGINT COMMENT '加入购物车次数', `favor_count` BIGINT COMMENT '收藏次数', `order_count` BIGINT COMMENT '下单次数', `order_activity_count` BIGINT COMMENT '订单参与活动次数', `order_activity_reduce_amount` DECIMAL(16,2) COMMENT '订单减免金额(活动)', `order_coupon_count` BIGINT COMMENT '订单用券次数', `order_coupon_reduce_amount` DECIMAL(16,2) COMMENT '订单减免金额(优惠券)', `order_original_amount` DECIMAL(16,2) COMMENT '订单单原始金额', `order_final_amount` DECIMAL(16,2) COMMENT '订单总金额', `payment_count` BIGINT COMMENT '支付次数', `payment_amount` DECIMAL(16,2) COMMENT '支付金额', `refund_order_count` BIGINT COMMENT '退单次数', `refund_order_num` BIGINT COMMENT '退单件数', `refund_order_amount` DECIMAL(16,2) COMMENT '退单金额', `refund_payment_count` BIGINT COMMENT '退款次数', `refund_payment_num` BIGINT COMMENT '退款件数', `refund_payment_amount` DECIMAL(16,2) COMMENT '退款金额', `coupon_get_count` BIGINT COMMENT '优惠券领取次数', `coupon_using_count` BIGINT COMMENT '优惠券使用(下单)次数', `coupon_used_count` BIGINT COMMENT '优惠券使用(支付)次数', `appraise_good_count` BIGINT COMMENT '好评数', `appraise_mid_count` BIGINT COMMENT '中评数', `appraise_bad_count` BIGINT COMMENT '差评数', `appraise_default_count` BIGINT COMMENT '默认评价数', `order_detail_stats` array<struct<sku_id:string,sku_num:bigint,order_count:bigint,activity_reduce_amount:decimal(16,2),coupon_reduce_amount:decimal(16,2),original_amount:decimal(16,2),final_amount:decimal(16,2)>> COMMENT '下单明细统计' ) COMMENT '每日用户行为' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dws/dws_user_action_daycount/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35数据装载
首日装载
with tmp_login as ( select dt, user_id, count(*) login_count from dwd_page_log where user_id is not null and last_page_id is null group by dt,user_id ), tmp_cf as ( select dt, user_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from dwd_action_log where user_id is not null and action_id in ('cart_add','favor_add') group by dt,user_id ), tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, count(*) order_count, sum(if(activity_reduce_amount>0,1,0)) order_activity_count, sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count, sum(activity_reduce_amount) order_activity_reduce_amount, sum(coupon_reduce_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from dwd_order_info group by date_format(create_time,'yyyy-MM-dd'),user_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, user_id, count(*) payment_count, sum(payment_amount) payment_amount from dwd_payment_info group by date_format(callback_time,'yyyy-MM-dd'),user_id ), tmp_ri as ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from dwd_order_refund_info group by date_format(create_time,'yyyy-MM-dd'),user_id ), tmp_rp as ( select date_format(callback_time,'yyyy-MM-dd') dt, rp.user_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(rp.refund_amount) refund_payment_amount from ( select user_id, order_id, sku_id, refund_amount, callback_time from dwd_refund_payment )rp left join ( select user_id, order_id, sku_id, refund_num from dwd_order_refund_info )ri on rp.order_id=ri.order_id and rp.sku_id=rp.sku_id group by date_format(callback_time,'yyyy-MM-dd'),rp.user_id ), tmp_coupon as ( select coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt) dt, coalesce(coupon_get.user_id,coupon_using.user_id,coupon_used.user_id) user_id, nvl(coupon_get_count,0) coupon_get_count, nvl(coupon_using_count,0) coupon_using_count, nvl(coupon_used_count,0) coupon_used_count from ( select date_format(get_time,'yyyy-MM-dd') dt, user_id, count(*) coupon_get_count from dwd_coupon_use where get_time is not null group by user_id,date_format(get_time,'yyyy-MM-dd') )coupon_get full outer join ( select date_format(using_time,'yyyy-MM-dd') dt, user_id, count(*) coupon_using_count from dwd_coupon_use where using_time is not null group by user_id,date_format(using_time,'yyyy-MM-dd') )coupon_using on coupon_get.dt=coupon_using.dt and coupon_get.user_id=coupon_using.user_id full outer join ( select date_format(used_time,'yyyy-MM-dd') dt, user_id, count(*) coupon_used_count from dwd_coupon_use where used_time is not null group by user_id,date_format(used_time,'yyyy-MM-dd') )coupon_used on nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dt and nvl(coupon_get.user_id,coupon_using.user_id)=coupon_used.user_id ), tmp_comment as ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from dwd_comment_info group by date_format(create_time,'yyyy-MM-dd'),user_id ), tmp_od as ( select dt, user_id, collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_stats from ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, sku_id, sum(sku_num) sku_num, count(*) order_count, cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount, cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount, cast(sum(original_amount) as decimal(16,2)) original_amount, cast(sum(split_final_amount) as decimal(16,2)) final_amount from dwd_order_detail group by date_format(create_time,'yyyy-MM-dd'),user_id,sku_id )t1 group by dt,user_id ) insert overwrite table dws_user_action_daycount partition(dt) select coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id), nvl(login_count,0), nvl(cart_count,0), nvl(favor_count,0), nvl(order_count,0), nvl(order_activity_count,0), nvl(order_activity_reduce_amount,0), nvl(order_coupon_count,0), nvl(order_coupon_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_count,0), nvl(payment_amount,0), nvl(refund_order_count,0), nvl(refund_order_num,0), nvl(refund_order_amount,0), nvl(refund_payment_count,0), nvl(refund_payment_num,0), nvl(refund_payment_amount,0), nvl(coupon_get_count,0), nvl(coupon_using_count,0), nvl(coupon_used_count,0), nvl(appraise_good_count,0), nvl(appraise_mid_count,0), nvl(appraise_bad_count,0), nvl(appraise_default_count,0), order_detail_stats, coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt,tmp_od.dt) from tmp_login full outer join tmp_cf on tmp_login.user_id=tmp_cf.user_id and tmp_login.dt=tmp_cf.dt full outer join tmp_order on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id and coalesce(tmp_login.dt,tmp_cf.dt)=tmp_order.dt full outer join tmp_pay on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt)=tmp_pay.dt full outer join tmp_ri on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt)=tmp_ri.dt full outer join tmp_rp on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt)=tmp_rp.dt full outer join tmp_comment on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt)=tmp_comment.dt full outer join tmp_coupon on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt)=tmp_coupon.dt full outer join tmp_od on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt)=tmp_od.dt;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224每日装载
with tmp_login as ( select user_id, count(*) login_count from dwd_page_log where dt='2020-06-15' and user_id is not null and last_page_id is null group by user_id ), tmp_cf as ( select user_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from dwd_action_log where dt='2020-06-15' and user_id is not null and action_id in ('cart_add','favor_add') group by user_id ), tmp_order as ( select user_id, count(*) order_count, sum(if(activity_reduce_amount>0,1,0)) order_activity_count, sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count, sum(activity_reduce_amount) order_activity_reduce_amount, sum(coupon_reduce_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from dwd_order_info where (dt='2020-06-15' or dt='9999-99-99') and date_format(create_time,'yyyy-MM-dd')='2020-06-15' group by user_id ), tmp_pay as ( select user_id, count(*) payment_count, sum(payment_amount) payment_amount from dwd_payment_info where dt='2020-06-15' group by user_id ), tmp_ri as ( select user_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from dwd_order_refund_info where dt='2020-06-15' group by user_id ), tmp_rp as ( select rp.user_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(rp.refund_amount) refund_payment_amount from ( select user_id, order_id, sku_id, refund_amount from dwd_refund_payment where dt='2020-06-15' )rp left join ( select user_id, order_id, sku_id, refund_num from dwd_order_refund_info where dt>=date_add('2020-06-15',-15) )ri on rp.order_id=ri.order_id and rp.sku_id=rp.sku_id group by rp.user_id ), tmp_coupon as ( select user_id, sum(if(date_format(get_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_get_count, sum(if(date_format(using_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_using_count, sum(if(date_format(used_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_used_count from dwd_coupon_use where (dt='2020-06-15' or dt='9999-99-99') and (date_format(get_time, 'yyyy-MM-dd') = '2020-06-15' or date_format(using_time,'yyyy-MM-dd')='2020-06-15' or date_format(used_time,'yyyy-MM-dd')='2020-06-15') group by user_id ), tmp_comment as ( select user_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from dwd_comment_info where dt='2020-06-15' group by user_id ), tmp_od as ( select user_id, collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_stats from ( select user_id, sku_id, sum(sku_num) sku_num, count(*) order_count, cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount, cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount, cast(sum(original_amount) as decimal(16,2)) original_amount, cast(sum(split_final_amount) as decimal(16,2)) final_amount from dwd_order_detail where dt='2020-06-15' group by user_id,sku_id )t1 group by user_id ) insert overwrite table dws_user_action_daycount partition(dt='2020-06-15') select coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id), nvl(login_count,0), nvl(cart_count,0), nvl(favor_count,0), nvl(order_count,0), nvl(order_activity_count,0), nvl(order_activity_reduce_amount,0), nvl(order_coupon_count,0), nvl(order_coupon_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_count,0), nvl(payment_amount,0), nvl(refund_order_count,0), nvl(refund_order_num,0), nvl(refund_order_amount,0), nvl(refund_payment_count,0), nvl(refund_payment_num,0), nvl(refund_payment_amount,0), nvl(coupon_get_count,0), nvl(coupon_using_count,0), nvl(coupon_used_count,0), nvl(appraise_good_count,0), nvl(appraise_mid_count,0), nvl(appraise_bad_count,0), nvl(appraise_default_count,0), order_detail_stats from tmp_login full outer join tmp_cf on tmp_login.user_id=tmp_cf.user_id full outer join tmp_order on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id full outer join tmp_pay on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id full outer join tmp_ri on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id full outer join tmp_rp on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id full outer join tmp_comment on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id full outer join tmp_coupon on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id full outer join tmp_od on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
# 商品主题
建表语句
DROP TABLE IF EXISTS dws_sku_action_daycount; CREATE EXTERNAL TABLE dws_sku_action_daycount ( `sku_id` STRING COMMENT 'sku_id', `order_count` BIGINT COMMENT '被下单次数', `order_num` BIGINT COMMENT '被下单件数', `order_activity_count` BIGINT COMMENT '参与活动被下单次数', `order_coupon_count` BIGINT COMMENT '使用优惠券被下单次数', `order_activity_reduce_amount` DECIMAL(16,2) COMMENT '优惠金额(活动)', `order_coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠金额(优惠券)', `order_original_amount` DECIMAL(16,2) COMMENT '被下单原价金额', `order_final_amount` DECIMAL(16,2) COMMENT '被下单最终金额', `payment_count` BIGINT COMMENT '被支付次数', `payment_num` BIGINT COMMENT '被支付件数', `payment_amount` DECIMAL(16,2) COMMENT '被支付金额', `refund_order_count` BIGINT COMMENT '被退单次数', `refund_order_num` BIGINT COMMENT '被退单件数', `refund_order_amount` DECIMAL(16,2) COMMENT '被退单金额', `refund_payment_count` BIGINT COMMENT '被退款次数', `refund_payment_num` BIGINT COMMENT '被退款件数', `refund_payment_amount` DECIMAL(16,2) COMMENT '被退款金额', `cart_count` BIGINT COMMENT '被加入购物车次数', `favor_count` BIGINT COMMENT '被收藏次数', `appraise_good_count` BIGINT COMMENT '好评数', `appraise_mid_count` BIGINT COMMENT '中评数', `appraise_bad_count` BIGINT COMMENT '差评数', `appraise_default_count` BIGINT COMMENT '默认评价数' ) COMMENT '每日商品行为' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dws/dws_sku_action_daycount/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32数据装载
首日装载
with tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, sku_id, count(*) order_count, sum(sku_num) order_num, sum(if(split_activity_amount>0,1,0)) order_activity_count, sum(if(split_coupon_amount>0,1,0)) order_coupon_count, sum(split_activity_amount) order_activity_reduce_amount, sum(split_coupon_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from dwd_order_detail group by date_format(create_time,'yyyy-MM-dd'),sku_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, sku_id, count(*) payment_count, sum(sku_num) payment_num, sum(split_final_amount) payment_amount from dwd_order_detail od join ( select order_id, callback_time from dwd_payment_info where callback_time is not null )pi on pi.order_id=od.order_id group by date_format(callback_time,'yyyy-MM-dd'),sku_id ), tmp_ri as ( select date_format(create_time,'yyyy-MM-dd') dt, sku_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from dwd_order_refund_info group by date_format(create_time,'yyyy-MM-dd'),sku_id ), tmp_rp as ( select date_format(callback_time,'yyyy-MM-dd') dt, rp.sku_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(refund_amount) refund_payment_amount from ( select order_id, sku_id, refund_amount, callback_time from dwd_refund_payment )rp left join ( select order_id, sku_id, refund_num from dwd_order_refund_info )ri on rp.order_id=ri.order_id and rp.sku_id=ri.sku_id group by date_format(callback_time,'yyyy-MM-dd'),rp.sku_id ), tmp_cf as ( select dt, item sku_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from dwd_action_log where action_id in ('cart_add','favor_add') group by dt,item ), tmp_comment as ( select date_format(create_time,'yyyy-MM-dd') dt, sku_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from dwd_comment_info group by date_format(create_time,'yyyy-MM-dd'),sku_id ) insert overwrite table dws_sku_action_daycount partition(dt) select sku_id, sum(order_count), sum(order_num), sum(order_activity_count), sum(order_coupon_count), sum(order_activity_reduce_amount), sum(order_coupon_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_num), sum(payment_amount), sum(refund_order_count), sum(refund_order_num), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_num), sum(refund_payment_amount), sum(cart_count), sum(favor_count), sum(appraise_good_count), sum(appraise_mid_count), sum(appraise_bad_count), sum(appraise_default_count), dt from ( select dt, sku_id, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_order union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_num, payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_pay union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, refund_order_count, refund_order_num, refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_ri union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_rp union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, cart_count, favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_cf union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from tmp_comment )t1 group by dt,sku_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297每日装载
with tmp_order as ( select sku_id, count(*) order_count, sum(sku_num) order_num, sum(if(split_activity_amount>0,1,0)) order_activity_count, sum(if(split_coupon_amount>0,1,0)) order_coupon_count, sum(split_activity_amount) order_activity_reduce_amount, sum(split_coupon_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from dwd_order_detail where dt='2020-06-15' group by sku_id ), tmp_pay as ( select sku_id, count(*) payment_count, sum(sku_num) payment_num, sum(split_final_amount) payment_amount from dwd_order_detail where (dt='2020-06-15' or dt=date_add('2020-06-15',-1)) and order_id in ( select order_id from dwd_payment_info where dt='2020-06-15' ) group by sku_id ), tmp_ri as ( select sku_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from dwd_order_refund_info where dt='2020-06-15' group by sku_id ), tmp_rp as ( select rp.sku_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(refund_amount) refund_payment_amount from ( select order_id, sku_id, refund_amount from dwd_refund_payment where dt='2020-06-15' )rp left join ( select order_id, sku_id, refund_num from dwd_order_refund_info where dt>=date_add('2020-06-15',-15) )ri on rp.order_id=ri.order_id and rp.sku_id=ri.sku_id group by rp.sku_id ), tmp_cf as ( select item sku_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from dwd_action_log where dt='2020-06-15' and action_id in ('cart_add','favor_add') group by item ), tmp_comment as ( select sku_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from dwd_comment_info where dt='2020-06-15' group by sku_id ) insert overwrite table dws_sku_action_daycount partition(dt='2020-06-15') select sku_id, sum(order_count), sum(order_num), sum(order_activity_count), sum(order_coupon_count), sum(order_activity_reduce_amount), sum(order_coupon_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_num), sum(payment_amount), sum(refund_order_count), sum(refund_order_num), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_num), sum(refund_payment_amount), sum(cart_count), sum(favor_count), sum(appraise_good_count), sum(appraise_mid_count), sum(appraise_bad_count), sum(appraise_default_count) from ( select sku_id, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_order union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_num, payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_pay union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, refund_order_count, refund_order_num, refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_ri union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_rp union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, cart_count, favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_cf union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from tmp_comment )t1 group by sku_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
# 优惠券主题
建表语句
DROP TABLE IF EXISTS dws_coupon_info_daycount; CREATE EXTERNAL TABLE dws_coupon_info_daycount( `coupon_id` STRING COMMENT '优惠券ID', `get_count` BIGINT COMMENT '被领取次数', `order_count` BIGINT COMMENT '被使用(下单)次数', `order_reduce_amount` DECIMAL(16,2) COMMENT '用券下单优惠金额', `order_original_amount` DECIMAL(16,2) COMMENT '用券订单原价金额', `order_final_amount` DECIMAL(16,2) COMMENT '用券下单最终金额', `payment_count` BIGINT COMMENT '被使用(支付)次数', `payment_reduce_amount` DECIMAL(16,2) COMMENT '用券支付优惠金额', `payment_amount` DECIMAL(16,2) COMMENT '用券支付总金额', `expire_count` BIGINT COMMENT '过期次数' ) COMMENT '每日活动统计' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dws/dws_coupon_info_daycount/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17数据装载
首日装载
with tmp_cu as ( select coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt,coupon_exprie.dt) dt, coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id,coupon_exprie.coupon_id) coupon_id, nvl(get_count,0) get_count, nvl(order_count,0) order_count, nvl(payment_count,0) payment_count, nvl(expire_count,0) expire_count from ( select date_format(get_time,'yyyy-MM-dd') dt, coupon_id, count(*) get_count from dwd_coupon_use group by date_format(get_time,'yyyy-MM-dd'),coupon_id )coupon_get full outer join ( select date_format(using_time,'yyyy-MM-dd') dt, coupon_id, count(*) order_count from dwd_coupon_use where using_time is not null group by date_format(using_time,'yyyy-MM-dd'),coupon_id )coupon_using on coupon_get.dt=coupon_using.dt and coupon_get.coupon_id=coupon_using.coupon_id full outer join ( select date_format(used_time,'yyyy-MM-dd') dt, coupon_id, count(*) payment_count from dwd_coupon_use where used_time is not null group by date_format(used_time,'yyyy-MM-dd'),coupon_id )coupon_used on nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dt and nvl(coupon_get.coupon_id,coupon_using.coupon_id)=coupon_used.coupon_id full outer join ( select date_format(expire_time,'yyyy-MM-dd') dt, coupon_id, count(*) expire_count from dwd_coupon_use where expire_time is not null group by date_format(expire_time,'yyyy-MM-dd'),coupon_id )coupon_exprie on coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt)=coupon_exprie.dt and coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id)=coupon_exprie.coupon_id ), tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, coupon_id, sum(split_coupon_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from dwd_order_detail where coupon_id is not null group by date_format(create_time,'yyyy-MM-dd'),coupon_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, coupon_id, sum(split_coupon_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from ( select order_id, coupon_id, split_coupon_amount, split_final_amount from dwd_order_detail where coupon_id is not null )od join ( select order_id, callback_time from dwd_payment_info )pi on od.order_id=pi.order_id group by date_format(callback_time,'yyyy-MM-dd'),coupon_id ) insert overwrite table dws_coupon_info_daycount partition(dt) select coupon_id, sum(get_count), sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount), sum(expire_count), dt from ( select dt, coupon_id, get_count, order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, 0 payment_reduce_amount, 0 payment_amount, expire_count from tmp_cu union all select dt, coupon_id, 0 get_count, 0 order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount, 0 expire_count from tmp_order union all select dt, coupon_id, 0 get_count, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, payment_reduce_amount, payment_amount, 0 expire_count from tmp_pay )t1 group by dt,coupon_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153每日装载
with tmp_cu as ( select coupon_id, sum(if(date_format(get_time,'yyyy-MM-dd')='2020-06-15',1,0)) get_count, sum(if(date_format(using_time,'yyyy-MM-dd')='2020-06-15',1,0)) order_count, sum(if(date_format(used_time,'yyyy-MM-dd')='2020-06-15',1,0)) payment_count, sum(if(date_format(expire_time,'yyyy-MM-dd')='2020-06-15',1,0)) expire_count from dwd_coupon_use where dt='9999-99-99' or dt='2020-06-15' group by coupon_id ), tmp_order as ( select coupon_id, sum(split_coupon_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from dwd_order_detail where dt='2020-06-15' and coupon_id is not null group by coupon_id ), tmp_pay as ( select coupon_id, sum(split_coupon_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from dwd_order_detail where (dt='2020-06-15' or dt=date_add('2020-06-15',-1)) and coupon_id is not null and order_id in ( select order_id from dwd_payment_info where dt='2020-06-15' ) group by coupon_id ) insert overwrite table dws_coupon_info_daycount partition(dt='2020-06-15') select coupon_id, sum(get_count), sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount), sum(expire_count) from ( select coupon_id, get_count, order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, 0 payment_reduce_amount, 0 payment_amount, expire_count from tmp_cu union all select coupon_id, 0 get_count, 0 order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount, 0 expire_count from tmp_order union all select coupon_id, 0 get_count, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, payment_reduce_amount, payment_amount, 0 expire_count from tmp_pay )t1 group by coupon_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
# 活动主题
建表语句
DROP TABLE IF EXISTS dws_activity_info_daycount; CREATE EXTERNAL TABLE dws_activity_info_daycount( `activity_rule_id` STRING COMMENT '活动规则ID', `activity_id` STRING COMMENT '活动ID', `order_count` BIGINT COMMENT '参与某活动某规则下单次数', `order_reduce_amount` DECIMAL(16,2) COMMENT '参与某活动某规则下单减免金额', `order_original_amount` DECIMAL(16,2) COMMENT '参与某活动某规则下单原始金额', `order_final_amount` DECIMAL(16,2) COMMENT '参与某活动某规则下单最终金额', `payment_count` BIGINT COMMENT '参与某活动某规则支付次数', `payment_reduce_amount` DECIMAL(16,2) COMMENT '参与某活动某规则支付减免金额', `payment_amount` DECIMAL(16,2) COMMENT '参与某活动某规则支付金额' ) COMMENT '每日活动统计' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dws/dws_activity_info_daycount/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15数据装载
首日装载
with tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, activity_rule_id, activity_id, count(*) order_count, sum(split_activity_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from dwd_order_detail where activity_id is not null group by date_format(create_time,'yyyy-MM-dd'),activity_rule_id,activity_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, activity_rule_id, activity_id, count(*) payment_count, sum(split_activity_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from ( select activity_rule_id, activity_id, order_id, split_activity_amount, split_final_amount from dwd_order_detail where activity_id is not null )od join ( select order_id, callback_time from dwd_payment_info )pi on od.order_id=pi.order_id group by date_format(callback_time,'yyyy-MM-dd'),activity_rule_id,activity_id ) insert overwrite table dws_activity_info_daycount partition(dt) select activity_rule_id, activity_id, sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount), dt from ( select dt, activity_rule_id, activity_id, order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount from tmp_order union all select dt, activity_rule_id, activity_id, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_reduce_amount, payment_amount from tmp_pay )t1 group by dt,activity_rule_id,activity_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86每日装载
with tmp_order as ( select activity_rule_id, activity_id, count(*) order_count, sum(split_activity_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from dwd_order_detail where dt='2020-06-15' and activity_id is not null group by activity_rule_id,activity_id ), tmp_pay as ( select activity_rule_id, activity_id, count(*) payment_count, sum(split_activity_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from dwd_order_detail where (dt='2020-06-15' or dt=date_add('2020-06-15',-1)) and activity_id is not null and order_id in ( select order_id from dwd_payment_info where dt='2020-06-15' ) group by activity_rule_id,activity_id ) insert overwrite table dws_activity_info_daycount partition(dt='2020-06-15') select activity_rule_id, activity_id, sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount) from ( select activity_rule_id, activity_id, order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount from tmp_order union all select activity_rule_id, activity_id, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_reduce_amount, payment_amount from tmp_pay )t1 group by activity_rule_id,activity_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
# 地区主题
建表语句
DROP TABLE IF EXISTS dws_area_stats_daycount; CREATE EXTERNAL TABLE dws_area_stats_daycount( `province_id` STRING COMMENT '地区编号', `visit_count` BIGINT COMMENT '访问次数', `login_count` BIGINT COMMENT '登录次数', `visitor_count` BIGINT COMMENT '访客人数', `user_count` BIGINT COMMENT '用户人数', `order_count` BIGINT COMMENT '下单次数', `order_original_amount` DECIMAL(16,2) COMMENT '下单原始金额', `order_final_amount` DECIMAL(16,2) COMMENT '下单最终金额', `payment_count` BIGINT COMMENT '支付次数', `payment_amount` DECIMAL(16,2) COMMENT '支付金额', `refund_order_count` BIGINT COMMENT '退单次数', `refund_order_amount` DECIMAL(16,2) COMMENT '退单金额', `refund_payment_count` BIGINT COMMENT '退款次数', `refund_payment_amount` DECIMAL(16,2) COMMENT '退款金额' ) COMMENT '每日地区统计表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dws/dws_area_stats_daycount/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21数据装载
首日装载
with tmp_vu as ( select dt, id province_id, visit_count, login_count, visitor_count, user_count from ( select dt, area_code, count(*) visit_count,--访客访问次数 count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0)) count(distinct(mid_id)) visitor_count,--访客人数 count(distinct(user_id)) user_count--用户人数 from dwd_page_log where last_page_id is null group by dt,area_code )tmp left join dim_base_province area on tmp.area_code=area.area_code ), tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, province_id, count(*) order_count, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from dwd_order_info group by date_format(create_time,'yyyy-MM-dd'),province_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, province_id, count(*) payment_count, sum(payment_amount) payment_amount from dwd_payment_info group by date_format(callback_time,'yyyy-MM-dd'),province_id ), tmp_ro as ( select date_format(create_time,'yyyy-MM-dd') dt, province_id, count(*) refund_order_count, sum(refund_amount) refund_order_amount from dwd_order_refund_info group by date_format(create_time,'yyyy-MM-dd'),province_id ), tmp_rp as ( select date_format(callback_time,'yyyy-MM-dd') dt, province_id, count(*) refund_payment_count, sum(refund_amount) refund_payment_amount from dwd_refund_payment group by date_format(callback_time,'yyyy-MM-dd'),province_id ) insert overwrite table dws_area_stats_daycount partition(dt) select province_id, sum(visit_count), sum(login_count), sum(visitor_count), sum(user_count), sum(order_count), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_amount), sum(refund_order_count), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_amount), dt from ( select dt, province_id, visit_count, login_count, visitor_count, user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_vu union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, order_count, order_original_amount, order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_order union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, payment_count, payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_pay union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, refund_order_count, refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_ro union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, refund_payment_count, refund_payment_amount from tmp_rp )t1 group by dt,province_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177每日装载
with tmp_vu as ( select id province_id, visit_count, login_count, visitor_count, user_count from ( select area_code, count(*) visit_count,--访客访问次数 count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0)) count(distinct(mid_id)) visitor_count,--访客人数 count(distinct(user_id)) user_count--用户人数 from dwd_page_log where dt='2020-06-15' and last_page_id is null group by area_code )tmp left join dim_base_province area on tmp.area_code=area.area_code ), tmp_order as ( select province_id, count(*) order_count, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from dwd_order_info where dt='2020-06-15' or dt='9999-99-99' and date_format(create_time,'yyyy-MM-dd')='2020-06-15' group by province_id ), tmp_pay as ( select province_id, count(*) payment_count, sum(payment_amount) payment_amount from dwd_payment_info where dt='2020-06-15' group by province_id ), tmp_ro as ( select province_id, count(*) refund_order_count, sum(refund_amount) refund_order_amount from dwd_order_refund_info where dt='2020-06-15' group by province_id ), tmp_rp as ( select province_id, count(*) refund_payment_count, sum(refund_amount) refund_payment_amount from dwd_refund_payment where dt='2020-06-15' group by province_id ) insert overwrite table dws_area_stats_daycount partition(dt='2020-06-15') select province_id, sum(visit_count), sum(login_count), sum(visitor_count), sum(user_count), sum(order_count), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_amount), sum(refund_order_count), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_amount) from ( select province_id, visit_count, login_count, visitor_count, user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_vu union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, order_count, order_original_amount, order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_order union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, payment_count, payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_pay union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, refund_order_count, refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_ro union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, refund_payment_count, refund_payment_amount from tmp_rp )t1 group by province_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
# DWS层首日数据装载脚本
编写脚本
在/home/damoncai/bin目录下创建脚本dwd_to_dws_init.sh
#!/bin/bash APP=gmall if [ -n "$2" ] ;then do_date=$2 else echo "请传入日期参数" exit fi dws_visitor_action_daycount=" insert overwrite table ${APP}.dws_visitor_action_daycount partition(dt='$do_date') select t1.mid_id, t1.brand, t1.model, t1.is_new, t1.channel, t1.os, t1.area_code, t1.version_code, t1.visit_count, t3.page_stats from ( select mid_id, brand, model, if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天内,同一设备的is_new字段,可能全部为1,可能全部为0,也可能部分为0,部分为1(卸载重装),故做该处理 collect_set(channel) channel, collect_set(os) os, collect_set(area_code) area_code, collect_set(version_code) version_code, sum(if(last_page_id is null,1,0)) visit_count from ${APP}.dwd_page_log where dt='$do_date' and last_page_id is null group by mid_id,model,brand )t1 join ( select mid_id, brand, model, collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_stats from ( select mid_id, brand, model, page_id, count(*) page_count, sum(during_time) during_time from ${APP}.dwd_page_log where dt='$do_date' group by mid_id,model,brand,page_id )t2 group by mid_id,model,brand )t3 on t1.mid_id=t3.mid_id and t1.brand=t3.brand and t1.model=t3.model; " dws_area_stats_daycount=" set hive.exec.dynamic.partition.mode=nonstrict; with tmp_vu as ( select dt, id province_id, visit_count, login_count, visitor_count, user_count from ( select dt, area_code, count(*) visit_count,--访客访问次数 count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0)) count(distinct(mid_id)) visitor_count,--访客人数 count(distinct(user_id)) user_count--用户人数 from ${APP}.dwd_page_log where last_page_id is null group by dt,area_code )tmp left join ${APP}.dim_base_province area on tmp.area_code=area.area_code ), tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, province_id, count(*) order_count, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from ${APP}.dwd_order_info group by date_format(create_time,'yyyy-MM-dd'),province_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, province_id, count(*) payment_count, sum(payment_amount) payment_amount from ${APP}.dwd_payment_info group by date_format(callback_time,'yyyy-MM-dd'),province_id ), tmp_ro as ( select date_format(create_time,'yyyy-MM-dd') dt, province_id, count(*) refund_order_count, sum(refund_amount) refund_order_amount from ${APP}.dwd_order_refund_info group by date_format(create_time,'yyyy-MM-dd'),province_id ), tmp_rp as ( select date_format(callback_time,'yyyy-MM-dd') dt, province_id, count(*) refund_payment_count, sum(refund_amount) refund_payment_amount from ${APP}.dwd_refund_payment group by date_format(callback_time,'yyyy-MM-dd'),province_id ) insert overwrite table ${APP}.dws_area_stats_daycount partition(dt) select province_id, sum(visit_count), sum(login_count), sum(visitor_count), sum(user_count), sum(order_count), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_amount), sum(refund_order_count), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_amount), dt from ( select dt, province_id, visit_count, login_count, visitor_count, user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_vu union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, order_count, order_original_amount, order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_order union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, payment_count, payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_pay union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, refund_order_count, refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_ro union all select dt, province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, refund_payment_count, refund_payment_amount from tmp_rp )t1 group by dt,province_id; " dws_user_action_daycount=" set hive.exec.dynamic.partition.mode=nonstrict; with tmp_login as ( select dt, user_id, count(*) login_count from ${APP}.dwd_page_log where user_id is not null and last_page_id is null group by dt,user_id ), tmp_cf as ( select dt, user_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from ${APP}.dwd_action_log where user_id is not null and action_id in ('cart_add','favor_add') group by dt,user_id ), tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, count(*) order_count, sum(if(activity_reduce_amount>0,1,0)) order_activity_count, sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count, sum(activity_reduce_amount) order_activity_reduce_amount, sum(coupon_reduce_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from ${APP}.dwd_order_info group by date_format(create_time,'yyyy-MM-dd'),user_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, user_id, count(*) payment_count, sum(payment_amount) payment_amount from ${APP}.dwd_payment_info group by date_format(callback_time,'yyyy-MM-dd'),user_id ), tmp_ri as ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from ${APP}.dwd_order_refund_info group by date_format(create_time,'yyyy-MM-dd'),user_id ), tmp_rp as ( select date_format(callback_time,'yyyy-MM-dd') dt, rp.user_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(rp.refund_amount) refund_payment_amount from ( select user_id, order_id, sku_id, refund_amount, callback_time from ${APP}.dwd_refund_payment )rp left join ( select user_id, order_id, sku_id, refund_num from ${APP}.dwd_order_refund_info )ri on rp.order_id=ri.order_id and rp.sku_id=rp.sku_id group by date_format(callback_time,'yyyy-MM-dd'),rp.user_id ), tmp_coupon as ( select coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt) dt, coalesce(coupon_get.user_id,coupon_using.user_id,coupon_used.user_id) user_id, nvl(coupon_get_count,0) coupon_get_count, nvl(coupon_using_count,0) coupon_using_count, nvl(coupon_used_count,0) coupon_used_count from ( select date_format(get_time,'yyyy-MM-dd') dt, user_id, count(*) coupon_get_count from ${APP}.dwd_coupon_use where get_time is not null group by user_id,date_format(get_time,'yyyy-MM-dd') )coupon_get full outer join ( select date_format(using_time,'yyyy-MM-dd') dt, user_id, count(*) coupon_using_count from ${APP}.dwd_coupon_use where using_time is not null group by user_id,date_format(using_time,'yyyy-MM-dd') )coupon_using on coupon_get.dt=coupon_using.dt and coupon_get.user_id=coupon_using.user_id full outer join ( select date_format(used_time,'yyyy-MM-dd') dt, user_id, count(*) coupon_used_count from ${APP}.dwd_coupon_use where used_time is not null group by user_id,date_format(used_time,'yyyy-MM-dd') )coupon_used on nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dt and nvl(coupon_get.user_id,coupon_using.user_id)=coupon_used.user_id ), tmp_comment as ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from ${APP}.dwd_comment_info group by date_format(create_time,'yyyy-MM-dd'),user_id ), tmp_od as ( select dt, user_id, collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_stats from ( select date_format(create_time,'yyyy-MM-dd') dt, user_id, sku_id, sum(sku_num) sku_num, count(*) order_count, cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount, cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount, cast(sum(original_amount) as decimal(16,2)) original_amount, cast(sum(split_final_amount) as decimal(16,2)) final_amount from ${APP}.dwd_order_detail group by date_format(create_time,'yyyy-MM-dd'),user_id,sku_id )t1 group by dt,user_id ) insert overwrite table ${APP}.dws_user_action_daycount partition(dt) select coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id), nvl(login_count,0), nvl(cart_count,0), nvl(favor_count,0), nvl(order_count,0), nvl(order_activity_count,0), nvl(order_activity_reduce_amount,0), nvl(order_coupon_count,0), nvl(order_coupon_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_count,0), nvl(payment_amount,0), nvl(refund_order_count,0), nvl(refund_order_num,0), nvl(refund_order_amount,0), nvl(refund_payment_count,0), nvl(refund_payment_num,0), nvl(refund_payment_amount,0), nvl(coupon_get_count,0), nvl(coupon_using_count,0), nvl(coupon_used_count,0), nvl(appraise_good_count,0), nvl(appraise_mid_count,0), nvl(appraise_bad_count,0), nvl(appraise_default_count,0), order_detail_stats, coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt,tmp_od.dt) from tmp_login full outer join tmp_cf on tmp_login.user_id=tmp_cf.user_id and tmp_login.dt=tmp_cf.dt full outer join tmp_order on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id and coalesce(tmp_login.dt,tmp_cf.dt)=tmp_order.dt full outer join tmp_pay on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt)=tmp_pay.dt full outer join tmp_ri on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt)=tmp_ri.dt full outer join tmp_rp on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt)=tmp_rp.dt full outer join tmp_comment on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt)=tmp_comment.dt full outer join tmp_coupon on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt)=tmp_coupon.dt full outer join tmp_od on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt)=tmp_od.dt; " dws_activity_info_daycount=" set hive.exec.dynamic.partition.mode=nonstrict; with tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, activity_rule_id, activity_id, count(*) order_count, sum(split_activity_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from ${APP}.dwd_order_detail where activity_id is not null group by date_format(create_time,'yyyy-MM-dd'),activity_rule_id,activity_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, activity_rule_id, activity_id, count(*) payment_count, sum(split_activity_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from ( select activity_rule_id, activity_id, order_id, split_activity_amount, split_final_amount from ${APP}.dwd_order_detail where activity_id is not null )od join ( select order_id, callback_time from ${APP}.dwd_payment_info )pi on od.order_id=pi.order_id group by date_format(callback_time,'yyyy-MM-dd'),activity_rule_id,activity_id ) insert overwrite table ${APP}.dws_activity_info_daycount partition(dt) select activity_rule_id, activity_id, sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount), dt from ( select dt, activity_rule_id, activity_id, order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount from tmp_order union all select dt, activity_rule_id, activity_id, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_reduce_amount, payment_amount from tmp_pay )t1 group by dt,activity_rule_id,activity_id;" dws_sku_action_daycount=" set hive.exec.dynamic.partition.mode=nonstrict; with tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, sku_id, count(*) order_count, sum(sku_num) order_num, sum(if(split_activity_amount>0,1,0)) order_activity_count, sum(if(split_coupon_amount>0,1,0)) order_coupon_count, sum(split_activity_amount) order_activity_reduce_amount, sum(split_coupon_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from ${APP}.dwd_order_detail group by date_format(create_time,'yyyy-MM-dd'),sku_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, sku_id, count(*) payment_count, sum(sku_num) payment_num, sum(split_final_amount) payment_amount from ${APP}.dwd_order_detail od join ( select order_id, callback_time from ${APP}.dwd_payment_info where callback_time is not null )pi on pi.order_id=od.order_id group by date_format(callback_time,'yyyy-MM-dd'),sku_id ), tmp_ri as ( select date_format(create_time,'yyyy-MM-dd') dt, sku_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from ${APP}.dwd_order_refund_info group by date_format(create_time,'yyyy-MM-dd'),sku_id ), tmp_rp as ( select date_format(callback_time,'yyyy-MM-dd') dt, rp.sku_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(refund_amount) refund_payment_amount from ( select order_id, sku_id, refund_amount, callback_time from ${APP}.dwd_refund_payment )rp left join ( select order_id, sku_id, refund_num from ${APP}.dwd_order_refund_info )ri on rp.order_id=ri.order_id and rp.sku_id=ri.sku_id group by date_format(callback_time,'yyyy-MM-dd'),rp.sku_id ), tmp_cf as ( select dt, item sku_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from ${APP}.dwd_action_log where action_id in ('cart_add','favor_add') group by dt,item ), tmp_comment as ( select date_format(create_time,'yyyy-MM-dd') dt, sku_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from ${APP}.dwd_comment_info group by date_format(create_time,'yyyy-MM-dd'),sku_id ) insert overwrite table ${APP}.dws_sku_action_daycount partition(dt) select sku_id, sum(order_count), sum(order_num), sum(order_activity_count), sum(order_coupon_count), sum(order_activity_reduce_amount), sum(order_coupon_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_num), sum(payment_amount), sum(refund_order_count), sum(refund_order_num), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_num), sum(refund_payment_amount), sum(cart_count), sum(favor_count), sum(appraise_good_count), sum(appraise_mid_count), sum(appraise_bad_count), sum(appraise_default_count), dt from ( select dt, sku_id, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_order union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_num, payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_pay union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, refund_order_count, refund_order_num, refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_ri union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_rp union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, cart_count, favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_cf union all select dt, sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from tmp_comment )t1 group by dt,sku_id;" dws_coupon_info_daycount=" set hive.exec.dynamic.partition.mode=nonstrict; with tmp_cu as ( select coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt,coupon_exprie.dt) dt, coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id,coupon_exprie.coupon_id) coupon_id, nvl(get_count,0) get_count, nvl(order_count,0) order_count, nvl(payment_count,0) payment_count, nvl(expire_count,0) expire_count from ( select date_format(get_time,'yyyy-MM-dd') dt, coupon_id, count(*) get_count from ${APP}.dwd_coupon_use group by date_format(get_time,'yyyy-MM-dd'),coupon_id )coupon_get full outer join ( select date_format(using_time,'yyyy-MM-dd') dt, coupon_id, count(*) order_count from ${APP}.dwd_coupon_use where using_time is not null group by date_format(using_time,'yyyy-MM-dd'),coupon_id )coupon_using on coupon_get.dt=coupon_using.dt and coupon_get.coupon_id=coupon_using.coupon_id full outer join ( select date_format(used_time,'yyyy-MM-dd') dt, coupon_id, count(*) payment_count from ${APP}.dwd_coupon_use where used_time is not null group by date_format(used_time,'yyyy-MM-dd'),coupon_id )coupon_used on nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dt and nvl(coupon_get.coupon_id,coupon_using.coupon_id)=coupon_used.coupon_id full outer join ( select date_format(expire_time,'yyyy-MM-dd') dt, coupon_id, count(*) expire_count from ${APP}.dwd_coupon_use where expire_time is not null group by date_format(expire_time,'yyyy-MM-dd'),coupon_id )coupon_exprie on coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt)=coupon_exprie.dt and coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id)=coupon_exprie.coupon_id ), tmp_order as ( select date_format(create_time,'yyyy-MM-dd') dt, coupon_id, sum(split_coupon_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from ${APP}.dwd_order_detail where coupon_id is not null group by date_format(create_time,'yyyy-MM-dd'),coupon_id ), tmp_pay as ( select date_format(callback_time,'yyyy-MM-dd') dt, coupon_id, sum(split_coupon_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from ( select order_id, coupon_id, split_coupon_amount, split_final_amount from ${APP}.dwd_order_detail where coupon_id is not null )od join ( select order_id, callback_time from ${APP}.dwd_payment_info )pi on od.order_id=pi.order_id group by date_format(callback_time,'yyyy-MM-dd'),coupon_id ) insert overwrite table ${APP}.dws_coupon_info_daycount partition(dt) select coupon_id, sum(get_count), sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount), sum(expire_count), dt from ( select dt, coupon_id, get_count, order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, 0 payment_reduce_amount, 0 payment_amount, expire_count from tmp_cu union all select dt, coupon_id, 0 get_count, 0 order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount, 0 expire_count from tmp_order union all select dt, coupon_id, 0 get_count, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, payment_reduce_amount, payment_amount, 0 expire_count from tmp_pay )t1 group by dt,coupon_id; " case $1 in "dws_visitor_action_daycount" ) hive -e "$dws_visitor_action_daycount" ;; "dws_user_action_daycount" ) hive -e "$dws_user_action_daycount" ;; "dws_activity_info_daycount" ) hive -e "$dws_activity_info_daycount" ;; "dws_area_stats_daycount" ) hive -e "$dws_area_stats_daycount" ;; "dws_sku_action_daycount" ) hive -e "$dws_sku_action_daycount" ;; "dws_coupon_info_daycount" ) hive -e "$dws_coupon_info_daycount" ;; "all" ) hive -e "$dws_visitor_action_daycount$dws_user_action_daycount$dws_activity_info_daycount$dws_area_stats_daycount$dws_sku_action_daycount$dws_coupon_info_daycount" ;; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046添加执行权限
执行脚本
dwd_to_dws_init.sh all 2020-06-14
1
# DWS层每日数据装载脚本
在/home/damoncai/bin目录下创建脚本dwd_to_dws.sh
#!/bin/bash APP=gmall # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$2" ] ;then do_date=$2 else do_date=`date -d "-1 day" +%F` fi dws_visitor_action_daycount="insert overwrite table ${APP}.dws_visitor_action_daycount partition(dt='$do_date') select t1.mid_id, t1.brand, t1.model, t1.is_new, t1.channel, t1.os, t1.area_code, t1.version_code, t1.visit_count, t3.page_stats from ( select mid_id, brand, model, if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天内,同一设备的is_new字段,可能全部为1,可能全部为0,也可能部分为0,部分为1(卸载重装),故做该处理 collect_set(channel) channel, collect_set(os) os, collect_set(area_code) area_code, collect_set(version_code) version_code, sum(if(last_page_id is null,1,0)) visit_count from ${APP}.dwd_page_log where dt='$do_date' and last_page_id is null group by mid_id,model,brand )t1 join ( select mid_id, brand, model, collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_stats from ( select mid_id, brand, model, page_id, count(*) page_count, sum(during_time) during_time from ${APP}.dwd_page_log where dt='$do_date' group by mid_id,model,brand,page_id )t2 group by mid_id,model,brand )t3 on t1.mid_id=t3.mid_id and t1.brand=t3.brand and t1.model=t3.model;" dws_user_action_daycount=" with tmp_login as ( select user_id, count(*) login_count from ${APP}.dwd_page_log where dt='$do_date' and user_id is not null and last_page_id is null group by user_id ), tmp_cf as ( select user_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from ${APP}.dwd_action_log where dt='$do_date' and user_id is not null and action_id in ('cart_add','favor_add') group by user_id ), tmp_order as ( select user_id, count(*) order_count, sum(if(activity_reduce_amount>0,1,0)) order_activity_count, sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count, sum(activity_reduce_amount) order_activity_reduce_amount, sum(coupon_reduce_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from ${APP}.dwd_order_info where (dt='$do_date' or dt='9999-99-99') and date_format(create_time,'yyyy-MM-dd')='$do_date' group by user_id ), tmp_pay as ( select user_id, count(*) payment_count, sum(payment_amount) payment_amount from ${APP}.dwd_payment_info where dt='$do_date' group by user_id ), tmp_ri as ( select user_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from ${APP}.dwd_order_refund_info where dt='$do_date' group by user_id ), tmp_rp as ( select rp.user_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(rp.refund_amount) refund_payment_amount from ( select user_id, order_id, sku_id, refund_amount from ${APP}.dwd_refund_payment where dt='$do_date' )rp left join ( select user_id, order_id, sku_id, refund_num from ${APP}.dwd_order_refund_info where dt>=date_add('$do_date',-15) )ri on rp.order_id=ri.order_id and rp.sku_id=rp.sku_id group by rp.user_id ), tmp_coupon as ( select user_id, sum(if(date_format(get_time,'yyyy-MM-dd')='$do_date',1,0)) coupon_get_count, sum(if(date_format(using_time,'yyyy-MM-dd')='$do_date',1,0)) coupon_using_count, sum(if(date_format(used_time,'yyyy-MM-dd')='$do_date',1,0)) coupon_used_count from ${APP}.dwd_coupon_use where (dt='$do_date' or dt='9999-99-99') and (date_format(get_time, 'yyyy-MM-dd') = '$do_date' or date_format(using_time,'yyyy-MM-dd')='$do_date' or date_format(used_time,'yyyy-MM-dd')='$do_date') group by user_id ), tmp_comment as ( select user_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from ${APP}.dwd_comment_info where dt='$do_date' group by user_id ), tmp_od as ( select user_id, collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_stats from ( select user_id, sku_id, sum(sku_num) sku_num, count(*) order_count, cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount, cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount, cast(sum(original_amount) as decimal(16,2)) original_amount, cast(sum(split_final_amount) as decimal(16,2)) final_amount from ${APP}.dwd_order_detail where dt='$do_date' group by user_id,sku_id )t1 group by user_id ) insert overwrite table ${APP}.dws_user_action_daycount partition(dt='$do_date') select coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id), nvl(login_count,0), nvl(cart_count,0), nvl(favor_count,0), nvl(order_count,0), nvl(order_activity_count,0), nvl(order_activity_reduce_amount,0), nvl(order_coupon_count,0), nvl(order_coupon_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_count,0), nvl(payment_amount,0), nvl(refund_order_count,0), nvl(refund_order_num,0), nvl(refund_order_amount,0), nvl(refund_payment_count,0), nvl(refund_payment_num,0), nvl(refund_payment_amount,0), nvl(coupon_get_count,0), nvl(coupon_using_count,0), nvl(coupon_used_count,0), nvl(appraise_good_count,0), nvl(appraise_mid_count,0), nvl(appraise_bad_count,0), nvl(appraise_default_count,0), order_detail_stats from tmp_login full outer join tmp_cf on tmp_login.user_id=tmp_cf.user_id full outer join tmp_order on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id full outer join tmp_pay on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id full outer join tmp_ri on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id full outer join tmp_rp on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id full outer join tmp_comment on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id full outer join tmp_coupon on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id full outer join tmp_od on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id; " dws_activity_info_daycount=" with tmp_order as ( select activity_rule_id, activity_id, count(*) order_count, sum(split_activity_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from ${APP}.dwd_order_detail where dt='$do_date' and activity_id is not null group by activity_rule_id,activity_id ), tmp_pay as ( select activity_rule_id, activity_id, count(*) payment_count, sum(split_activity_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from ${APP}.dwd_order_detail where (dt='$do_date' or dt=date_add('$do_date',-1)) and activity_id is not null and order_id in ( select order_id from ${APP}.dwd_payment_info where dt='$do_date' ) group by activity_rule_id,activity_id ) insert overwrite table ${APP}.dws_activity_info_daycount partition(dt='$do_date') select activity_rule_id, activity_id, sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount) from ( select activity_rule_id, activity_id, order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount from tmp_order union all select activity_rule_id, activity_id, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_reduce_amount, payment_amount from tmp_pay )t1 group by activity_rule_id,activity_id;" dws_sku_action_daycount=" with tmp_order as ( select sku_id, count(*) order_count, sum(sku_num) order_num, sum(if(split_activity_amount>0,1,0)) order_activity_count, sum(if(split_coupon_amount>0,1,0)) order_coupon_count, sum(split_activity_amount) order_activity_reduce_amount, sum(split_coupon_amount) order_coupon_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from ${APP}.dwd_order_detail where dt='$do_date' group by sku_id ), tmp_pay as ( select sku_id, count(*) payment_count, sum(sku_num) payment_num, sum(split_final_amount) payment_amount from ${APP}.dwd_order_detail where (dt='$do_date' or dt=date_add('$do_date',-1)) and order_id in ( select order_id from ${APP}.dwd_payment_info where dt='$do_date' ) group by sku_id ), tmp_ri as ( select sku_id, count(*) refund_order_count, sum(refund_num) refund_order_num, sum(refund_amount) refund_order_amount from ${APP}.dwd_order_refund_info where dt='$do_date' group by sku_id ), tmp_rp as ( select rp.sku_id, count(*) refund_payment_count, sum(ri.refund_num) refund_payment_num, sum(refund_amount) refund_payment_amount from ( select order_id, sku_id, refund_amount from ${APP}.dwd_refund_payment where dt='$do_date' )rp left join ( select order_id, sku_id, refund_num from ${APP}.dwd_order_refund_info where dt>=date_add('$do_date',-15) )ri on rp.order_id=ri.order_id and rp.sku_id=ri.sku_id group by rp.sku_id ), tmp_cf as ( select item sku_id, sum(if(action_id='cart_add',1,0)) cart_count, sum(if(action_id='favor_add',1,0)) favor_count from ${APP}.dwd_action_log where dt='$do_date' and action_id in ('cart_add','favor_add') group by item ), tmp_comment as ( select sku_id, sum(if(appraise='1201',1,0)) appraise_good_count, sum(if(appraise='1202',1,0)) appraise_mid_count, sum(if(appraise='1203',1,0)) appraise_bad_count, sum(if(appraise='1204',1,0)) appraise_default_count from ${APP}.dwd_comment_info where dt='$do_date' group by sku_id ) insert overwrite table ${APP}.dws_sku_action_daycount partition(dt='$do_date') select sku_id, sum(order_count), sum(order_num), sum(order_activity_count), sum(order_coupon_count), sum(order_activity_reduce_amount), sum(order_coupon_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_num), sum(payment_amount), sum(refund_order_count), sum(refund_order_num), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_num), sum(refund_payment_amount), sum(cart_count), sum(favor_count), sum(appraise_good_count), sum(appraise_mid_count), sum(appraise_bad_count), sum(appraise_default_count) from ( select sku_id, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_order union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, payment_num, payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_pay union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, refund_order_count, refund_order_num, refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_ri union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, 0 cart_count, 0 favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_rp union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, cart_count, favor_count, 0 appraise_good_count, 0 appraise_mid_count, 0 appraise_bad_count, 0 appraise_default_count from tmp_cf union all select sku_id, 0 order_count, 0 order_num, 0 order_activity_count, 0 order_coupon_count, 0 order_activity_reduce_amount, 0 order_coupon_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_num, 0 payment_amount, 0 refund_order_count, 0 refund_order_num, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_num, 0 refund_payment_amount, 0 cart_count, 0 favor_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from tmp_comment )t1 group by sku_id;" dws_coupon_info_daycount=" with tmp_cu as ( select coupon_id, sum(if(date_format(get_time,'yyyy-MM-dd')='$do_date',1,0)) get_count, sum(if(date_format(using_time,'yyyy-MM-dd')='$do_date',1,0)) order_count, sum(if(date_format(used_time,'yyyy-MM-dd')='$do_date',1,0)) payment_count, sum(if(date_format(expire_time,'yyyy-MM-dd')='$do_date',1,0)) expire_count from ${APP}.dwd_coupon_use where dt='9999-99-99' or dt='$do_date' group by coupon_id ), tmp_order as ( select coupon_id, sum(split_coupon_amount) order_reduce_amount, sum(original_amount) order_original_amount, sum(split_final_amount) order_final_amount from ${APP}.dwd_order_detail where dt='$do_date' and coupon_id is not null group by coupon_id ), tmp_pay as ( select coupon_id, sum(split_coupon_amount) payment_reduce_amount, sum(split_final_amount) payment_amount from ${APP}.dwd_order_detail where (dt='$do_date' or dt=date_add('$do_date',-1)) and coupon_id is not null and order_id in ( select order_id from ${APP}.dwd_payment_info where dt='$do_date' ) group by coupon_id ) insert overwrite table ${APP}.dws_coupon_info_daycount partition(dt='$do_date') select coupon_id, sum(get_count), sum(order_count), sum(order_reduce_amount), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_reduce_amount), sum(payment_amount), sum(expire_count) from ( select coupon_id, get_count, order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, payment_count, 0 payment_reduce_amount, 0 payment_amount, expire_count from tmp_cu union all select coupon_id, 0 get_count, 0 order_count, order_reduce_amount, order_original_amount, order_final_amount, 0 payment_count, 0 payment_reduce_amount, 0 payment_amount, 0 expire_count from tmp_order union all select coupon_id, 0 get_count, 0 order_count, 0 order_reduce_amount, 0 order_original_amount, 0 order_final_amount, 0 payment_count, payment_reduce_amount, payment_amount, 0 expire_count from tmp_pay )t1 group by coupon_id;" dws_area_stats_daycount=" with tmp_vu as ( select id province_id, visit_count, login_count, visitor_count, user_count from ( select area_code, count(*) visit_count,--访客访问次数 count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0)) count(distinct(mid_id)) visitor_count,--访客人数 count(distinct(user_id)) user_count--用户人数 from ${APP}.dwd_page_log where dt='$do_date' and last_page_id is null group by area_code )tmp left join ${APP}.dim_base_province area on tmp.area_code=area.area_code ), tmp_order as ( select province_id, count(*) order_count, sum(original_amount) order_original_amount, sum(final_amount) order_final_amount from ${APP}.dwd_order_info where dt='$do_date' or dt='9999-99-99' and date_format(create_time,'yyyy-MM-dd')='$do_date' group by province_id ), tmp_pay as ( select province_id, count(*) payment_count, sum(payment_amount) payment_amount from ${APP}.dwd_payment_info where dt='$do_date' group by province_id ), tmp_ro as ( select province_id, count(*) refund_order_count, sum(refund_amount) refund_order_amount from ${APP}.dwd_order_refund_info where dt='$do_date' group by province_id ), tmp_rp as ( select province_id, count(*) refund_payment_count, sum(refund_amount) refund_payment_amount from ${APP}.dwd_refund_payment where dt='$do_date' group by province_id ) insert overwrite table ${APP}.dws_area_stats_daycount partition(dt='$do_date') select province_id, sum(visit_count), sum(login_count), sum(visitor_count), sum(user_count), sum(order_count), sum(order_original_amount), sum(order_final_amount), sum(payment_count), sum(payment_amount), sum(refund_order_count), sum(refund_order_amount), sum(refund_payment_count), sum(refund_payment_amount) from ( select province_id, visit_count, login_count, visitor_count, user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_vu union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, order_count, order_original_amount, order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_order union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, payment_count, payment_amount, 0 refund_order_count, 0 refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_pay union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, refund_order_count, refund_order_amount, 0 refund_payment_count, 0 refund_payment_amount from tmp_ro union all select province_id, 0 visit_count, 0 login_count, 0 visitor_count, 0 user_count, 0 order_count, 0 order_original_amount, 0 order_final_amount, 0 payment_count, 0 payment_amount, 0 refund_order_count, 0 refund_order_amount, refund_payment_count, refund_payment_amount from tmp_rp )t1 group by province_id;" case $1 in "dws_visitor_action_daycount" ) hive -e "$dws_visitor_action_daycount" ;; "dws_user_action_daycount" ) hive -e "$dws_user_action_daycount" ;; "dws_activity_info_daycount" ) hive -e "$dws_activity_info_daycount" ;; "dws_area_stats_daycount" ) hive -e "$dws_area_stats_daycount" ;; "dws_sku_action_daycount" ) hive -e "$dws_sku_action_daycount" ;; "dws_coupon_info_daycount" ) hive -e "$dws_coupon_info_daycount" ;; "all" ) hive -e "$dws_visitor_action_daycount$dws_user_action_daycount$dws_activity_info_daycount$dws_area_stats_daycount$dws_sku_action_daycount$dws_coupon_info_daycount" ;; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907添加权限
执行脚本
dwd_to_dws.sh all 2020-06-14
1
# 数仓搭建-DWT层
# 访客主题
建表语句
DROP TABLE IF EXISTS dwt_visitor_topic; CREATE EXTERNAL TABLE dwt_visitor_topic ( `mid_id` STRING COMMENT '设备id', `brand` STRING COMMENT '手机品牌', `model` STRING COMMENT '手机型号', `channel` ARRAY<STRING> COMMENT '渠道', `os` ARRAY<STRING> COMMENT '操作系统', `area_code` ARRAY<STRING> COMMENT '地区ID', `version_code` ARRAY<STRING> COMMENT '应用版本', `visit_date_first` STRING COMMENT '首次访问时间', `visit_date_last` STRING COMMENT '末次访问时间', `visit_last_1d_count` BIGINT COMMENT '最近1日访问次数', `visit_last_1d_day_count` BIGINT COMMENT '最近1日访问天数', `visit_last_7d_count` BIGINT COMMENT '最近7日访问次数', `visit_last_7d_day_count` BIGINT COMMENT '最近7日访问天数', `visit_last_30d_count` BIGINT COMMENT '最近30日访问次数', `visit_last_30d_day_count` BIGINT COMMENT '最近30日访问天数', `visit_count` BIGINT COMMENT '累积访问次数', `visit_day_count` BIGINT COMMENT '累积访问天数' ) COMMENT '设备主题宽表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwt/dwt_visitor_topic' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25数据装载
insert overwrite table dwt_visitor_topic partition(dt='2020-06-14') select nvl(1d_ago.mid_id,old.mid_id), nvl(1d_ago.brand,old.brand), nvl(1d_ago.model,old.model), nvl(1d_ago.channel,old.channel), nvl(1d_ago.os,old.os), nvl(1d_ago.area_code,old.area_code), nvl(1d_ago.version_code,old.version_code), case when old.mid_id is null and 1d_ago.is_new=1 then '2020-06-14' when old.mid_id is null and 1d_ago.is_new=0 then '2020-06-13'--无法获取准确的首次登录日期,给定一个数仓搭建日之前的日期 else old.visit_date_first end, if(1d_ago.mid_id is not null,'2020-06-14',old.visit_date_last), nvl(1d_ago.visit_count,0), if(1d_ago.mid_id is null,0,1), nvl(old.visit_last_7d_count,0)+nvl(1d_ago.visit_count,0)- nvl(7d_ago.visit_count,0), nvl(old.visit_last_7d_day_count,0)+if(1d_ago.mid_id is null,0,1)- if(7d_ago.mid_id is null,0,1), nvl(old.visit_last_30d_count,0)+nvl(1d_ago.visit_count,0)- nvl(30d_ago.visit_count,0), nvl(old.visit_last_30d_day_count,0)+if(1d_ago.mid_id is null,0,1)- if(30d_ago.mid_id is null,0,1), nvl(old.visit_count,0)+nvl(1d_ago.visit_count,0), nvl(old.visit_day_count,0)+if(1d_ago.mid_id is null,0,1) from ( select mid_id, brand, model, channel, os, area_code, version_code, visit_date_first, visit_date_last, visit_last_1d_count, visit_last_1d_day_count, visit_last_7d_count, visit_last_7d_day_count, visit_last_30d_count, visit_last_30d_day_count, visit_count, visit_day_count from dwt_visitor_topic where dt=date_add('2020-06-14',-1) )old full outer join ( select mid_id, brand, model, is_new, channel, os, area_code, version_code, visit_count from dws_visitor_action_daycount where dt='2020-06-14' )1d_ago on old.mid_id=1d_ago.mid_id left join ( select mid_id, brand, model, is_new, channel, os, area_code, version_code, visit_count from dws_visitor_action_daycount where dt=date_add('2020-06-14',-7) )7d_ago on old.mid_id=7d_ago.mid_id left join ( select mid_id, brand, model, is_new, channel, os, area_code, version_code, visit_count from dws_visitor_action_daycount where dt=date_add('2020-06-14',-30) )30d_ago on old.mid_id=30d_ago.mid_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
# 用户主题
建表语句
DROP TABLE IF EXISTS dwt_user_topic; CREATE EXTERNAL TABLE dwt_user_topic ( `user_id` STRING COMMENT '用户id', `login_date_first` STRING COMMENT '首次活跃日期', `login_date_last` STRING COMMENT '末次活跃日期', `login_date_1d_count` STRING COMMENT '最近1日登录次数', `login_last_1d_day_count` BIGINT COMMENT '最近1日登录天数', `login_last_7d_count` BIGINT COMMENT '最近7日登录次数', `login_last_7d_day_count` BIGINT COMMENT '最近7日登录天数', `login_last_30d_count` BIGINT COMMENT '最近30日登录次数', `login_last_30d_day_count` BIGINT COMMENT '最近30日登录天数', `login_count` BIGINT COMMENT '累积登录次数', `login_day_count` BIGINT COMMENT '累积登录天数', `order_date_first` STRING COMMENT '首次下单时间', `order_date_last` STRING COMMENT '末次下单时间', `order_last_1d_count` BIGINT COMMENT '最近1日下单次数', `order_activity_last_1d_count` BIGINT COMMENT '最近1日订单参与活动次数', `order_activity_reduce_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日订单减免金额(活动)', `order_coupon_last_1d_count` BIGINT COMMENT '最近1日下单用券次数', `order_coupon_reduce_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日订单减免金额(优惠券)', `order_last_1d_original_amount` DECIMAL(16,2) COMMENT '最近1日原始下单金额', `order_last_1d_final_amount` DECIMAL(16,2) COMMENT '最近1日最终下单金额', `order_last_7d_count` BIGINT COMMENT '最近7日下单次数', `order_activity_last_7d_count` BIGINT COMMENT '最近7日订单参与活动次数', `order_activity_reduce_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日订单减免金额(活动)', `order_coupon_last_7d_count` BIGINT COMMENT '最近7日下单用券次数', `order_coupon_reduce_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日订单减免金额(优惠券)', `order_last_7d_original_amount` DECIMAL(16,2) COMMENT '最近7日原始下单金额', `order_last_7d_final_amount` DECIMAL(16,2) COMMENT '最近7日最终下单金额', `order_last_30d_count` BIGINT COMMENT '最近30日下单次数', `order_activity_last_30d_count` BIGINT COMMENT '最近30日订单参与活动次数', `order_activity_reduce_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日订单减免金额(活动)', `order_coupon_last_30d_count` BIGINT COMMENT '最近30日下单用券次数', `order_coupon_reduce_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日订单减免金额(优惠券)', `order_last_30d_original_amount` DECIMAL(16,2) COMMENT '最近30日原始下单金额', `order_last_30d_final_amount` DECIMAL(16,2) COMMENT '最近30日最终下单金额', `order_count` BIGINT COMMENT '累积下单次数', `order_activity_count` BIGINT COMMENT '累积订单参与活动次数', `order_activity_reduce_amount` DECIMAL(16,2) COMMENT '累积订单减免金额(活动)', `order_coupon_count` BIGINT COMMENT '累积下单用券次数', `order_coupon_reduce_amount` DECIMAL(16,2) COMMENT '累积订单减免金额(优惠券)', `order_original_amount` DECIMAL(16,2) COMMENT '累积原始下单金额', `order_final_amount` DECIMAL(16,2) COMMENT '累积最终下单金额', `payment_date_first` STRING COMMENT '首次支付时间', `payment_date_last` STRING COMMENT '末次支付时间', `payment_last_1d_count` BIGINT COMMENT '最近1日支付次数', `payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日支付金额', `payment_last_7d_count` BIGINT COMMENT '最近7日支付次数', `payment_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日支付金额', `payment_last_30d_count` BIGINT COMMENT '最近30日支付次数', `payment_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日支付金额', `payment_count` BIGINT COMMENT '累积支付次数', `payment_amount` DECIMAL(16,2) COMMENT '累积支付金额', `refund_order_last_1d_count` BIGINT COMMENT '最近1日退单次数', `refund_order_last_1d_num` BIGINT COMMENT '最近1日退单件数', `refund_order_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日退单金额', `refund_order_last_7d_count` BIGINT COMMENT '最近7日退单次数', `refund_order_last_7d_num` BIGINT COMMENT '最近7日退单件数', `refund_order_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日退单金额', `refund_order_last_30d_count` BIGINT COMMENT '最近30日退单次数', `refund_order_last_30d_num` BIGINT COMMENT '最近30日退单件数', `refund_order_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日退单金额', `refund_order_count` BIGINT COMMENT '累积退单次数', `refund_order_num` BIGINT COMMENT '累积退单件数', `refund_order_amount` DECIMAL(16,2) COMMENT '累积退单金额', `refund_payment_last_1d_count` BIGINT COMMENT '最近1日退款次数', `refund_payment_last_1d_num` BIGINT COMMENT '最近1日退款件数', `refund_payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日退款金额', `refund_payment_last_7d_count` BIGINT COMMENT '最近7日退款次数', `refund_payment_last_7d_num` BIGINT COMMENT '最近7日退款件数', `refund_payment_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日退款金额', `refund_payment_last_30d_count` BIGINT COMMENT '最近30日退款次数', `refund_payment_last_30d_num` BIGINT COMMENT '最近30日退款件数', `refund_payment_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日退款金额', `refund_payment_count` BIGINT COMMENT '累积退款次数', `refund_payment_num` BIGINT COMMENT '累积退款件数', `refund_payment_amount` DECIMAL(16,2) COMMENT '累积退款金额', `cart_last_1d_count` BIGINT COMMENT '最近1日加入购物车次数', `cart_last_7d_count` BIGINT COMMENT '最近7日加入购物车次数', `cart_last_30d_count` BIGINT COMMENT '最近30日加入购物车次数', `cart_count` BIGINT COMMENT '累积加入购物车次数', `favor_last_1d_count` BIGINT COMMENT '最近1日收藏次数', `favor_last_7d_count` BIGINT COMMENT '最近7日收藏次数', `favor_last_30d_count` BIGINT COMMENT '最近30日收藏次数', `favor_count` BIGINT COMMENT '累积收藏次数', `coupon_last_1d_get_count` BIGINT COMMENT '最近1日领券次数', `coupon_last_1d_using_count` BIGINT COMMENT '最近1日用券(下单)次数', `coupon_last_1d_used_count` BIGINT COMMENT '最近1日用券(支付)次数', `coupon_last_7d_get_count` BIGINT COMMENT '最近7日领券次数', `coupon_last_7d_using_count` BIGINT COMMENT '最近7日用券(下单)次数', `coupon_last_7d_used_count` BIGINT COMMENT '最近7日用券(支付)次数', `coupon_last_30d_get_count` BIGINT COMMENT '最近30日领券次数', `coupon_last_30d_using_count` BIGINT COMMENT '最近30日用券(下单)次数', `coupon_last_30d_used_count` BIGINT COMMENT '最近30日用券(支付)次数', `coupon_get_count` BIGINT COMMENT '累积领券次数', `coupon_using_count` BIGINT COMMENT '累积用券(下单)次数', `coupon_used_count` BIGINT COMMENT '累积用券(支付)次数', `appraise_last_1d_good_count` BIGINT COMMENT '最近1日好评次数', `appraise_last_1d_mid_count` BIGINT COMMENT '最近1日中评次数', `appraise_last_1d_bad_count` BIGINT COMMENT '最近1日差评次数', `appraise_last_1d_default_count` BIGINT COMMENT '最近1日默认评价次数', `appraise_last_7d_good_count` BIGINT COMMENT '最近7日好评次数', `appraise_last_7d_mid_count` BIGINT COMMENT '最近7日中评次数', `appraise_last_7d_bad_count` BIGINT COMMENT '最近7日差评次数', `appraise_last_7d_default_count` BIGINT COMMENT '最近7日默认评价次数', `appraise_last_30d_good_count` BIGINT COMMENT '最近30日好评次数', `appraise_last_30d_mid_count` BIGINT COMMENT '最近30日中评次数', `appraise_last_30d_bad_count` BIGINT COMMENT '最近30日差评次数', `appraise_last_30d_default_count` BIGINT COMMENT '最近30日默认评价次数', `appraise_good_count` BIGINT COMMENT '累积好评次数', `appraise_mid_count` BIGINT COMMENT '累积中评次数', `appraise_bad_count` BIGINT COMMENT '累积差评次数', `appraise_default_count` BIGINT COMMENT '累积默认评价次数' )COMMENT '会员主题宽表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwt/dwt_user_topic/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119加载数据
首日装载
insert overwrite table dwt_user_topic partition(dt='2020-06-14') select id, login_date_first,--以用户的创建日期作为首次登录日期 nvl(login_date_last,date_add('2020-06-14',-1)),--若有历史登录记录,则根据历史记录获取末次登录日期,否则统一指定一个日期 nvl(login_last_1d_count,0), nvl(login_last_1d_day_count,0), nvl(login_last_7d_count,0), nvl(login_last_7d_day_count,0), nvl(login_last_30d_count,0), nvl(login_last_30d_day_count,0), nvl(login_count,0), nvl(login_day_count,0), order_date_first, order_date_last, nvl(order_last_1d_count,0), nvl(order_activity_last_1d_count,0), nvl(order_activity_reduce_last_1d_amount,0), nvl(order_coupon_last_1d_count,0), nvl(order_coupon_reduce_last_1d_amount,0), nvl(order_last_1d_original_amount,0), nvl(order_last_1d_final_amount,0), nvl(order_last_7d_count,0), nvl(order_activity_last_7d_count,0), nvl(order_activity_reduce_last_7d_amount,0), nvl(order_coupon_last_7d_count,0), nvl(order_coupon_reduce_last_7d_amount,0), nvl(order_last_7d_original_amount,0), nvl(order_last_7d_final_amount,0), nvl(order_last_30d_count,0), nvl(order_activity_last_30d_count,0), nvl(order_activity_reduce_last_30d_amount,0), nvl(order_coupon_last_30d_count,0), nvl(order_coupon_reduce_last_30d_amount,0), nvl(order_last_30d_original_amount,0), nvl(order_last_30d_final_amount,0), nvl(order_count,0), nvl(order_activity_count,0), nvl(order_activity_reduce_amount,0), nvl(order_coupon_count,0), nvl(order_coupon_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), payment_date_first, payment_date_last, nvl(payment_last_1d_count,0), nvl(payment_last_1d_amount,0), nvl(payment_last_7d_count,0), nvl(payment_last_7d_amount,0), nvl(payment_last_30d_count,0), nvl(payment_last_30d_amount,0), nvl(payment_count,0), nvl(payment_amount,0), nvl(refund_order_last_1d_count,0), nvl(refund_order_last_1d_num,0), nvl(refund_order_last_1d_amount,0), nvl(refund_order_last_7d_count,0), nvl(refund_order_last_7d_num,0), nvl(refund_order_last_7d_amount,0), nvl(refund_order_last_30d_count,0), nvl(refund_order_last_30d_num,0), nvl(refund_order_last_30d_amount,0), nvl(refund_order_count,0), nvl(refund_order_num,0), nvl(refund_order_amount,0), nvl(refund_payment_last_1d_count,0), nvl(refund_payment_last_1d_num,0), nvl(refund_payment_last_1d_amount,0), nvl(refund_payment_last_7d_count,0), nvl(refund_payment_last_7d_num,0), nvl(refund_payment_last_7d_amount,0), nvl(refund_payment_last_30d_count,0), nvl(refund_payment_last_30d_num,0), nvl(refund_payment_last_30d_amount,0), nvl(refund_payment_count,0), nvl(refund_payment_num,0), nvl(refund_payment_amount,0), nvl(cart_last_1d_count,0), nvl(cart_last_7d_count,0), nvl(cart_last_30d_count,0), nvl(cart_count,0), nvl(favor_last_1d_count,0), nvl(favor_last_7d_count,0), nvl(favor_last_30d_count,0), nvl(favor_count,0), nvl(coupon_last_1d_get_count,0), nvl(coupon_last_1d_using_count,0), nvl(coupon_last_1d_used_count,0), nvl(coupon_last_7d_get_count,0), nvl(coupon_last_7d_using_count,0), nvl(coupon_last_7d_used_count,0), nvl(coupon_last_30d_get_count,0), nvl(coupon_last_30d_using_count,0), nvl(coupon_last_30d_used_count,0), nvl(coupon_get_count,0), nvl(coupon_using_count,0), nvl(coupon_used_count,0), nvl(appraise_last_1d_good_count,0), nvl(appraise_last_1d_mid_count,0), nvl(appraise_last_1d_bad_count,0), nvl(appraise_last_1d_default_count,0), nvl(appraise_last_7d_good_count,0), nvl(appraise_last_7d_mid_count,0), nvl(appraise_last_7d_bad_count,0), nvl(appraise_last_7d_default_count,0), nvl(appraise_last_30d_good_count,0), nvl(appraise_last_30d_mid_count,0), nvl(appraise_last_30d_bad_count,0), nvl(appraise_last_30d_default_count,0), nvl(appraise_good_count,0), nvl(appraise_mid_count,0), nvl(appraise_bad_count,0), nvl(appraise_default_count,0) from ( select id, date_format(create_time,'yyyy-MM-dd') login_date_first from dim_user_info where dt='9999-99-99' )t1 left join ( select user_id user_id, max(dt) login_date_last, sum(if(dt='2020-06-14',login_count,0)) login_last_1d_count, sum(if(dt='2020-06-14' and login_count>0,1,0)) login_last_1d_day_count, sum(if(dt>=date_add('2020-06-14',-6),login_count,0)) login_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6) and login_count>0,1,0)) login_last_7d_day_count, sum(if(dt>=date_add('2020-06-14',-29),login_count,0)) login_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29) and login_count>0,1,0)) login_last_30d_day_count, sum(login_count) login_count, sum(if(login_count>0,1,0)) login_day_count, min(if(order_count>0,dt,null)) order_date_first, max(if(order_count>0,dt,null)) order_date_last, sum(if(dt='2020-06-14',order_count,0)) order_last_1d_count, sum(if(dt='2020-06-14',order_activity_count,0)) order_activity_last_1d_count, sum(if(dt='2020-06-14',order_activity_reduce_amount,0)) order_activity_reduce_last_1d_amount, sum(if(dt='2020-06-14',order_coupon_count,0)) order_coupon_last_1d_count, sum(if(dt='2020-06-14',order_coupon_reduce_amount,0)) order_coupon_reduce_last_1d_amount, sum(if(dt='2020-06-14',order_original_amount,0)) order_last_1d_original_amount, sum(if(dt='2020-06-14',order_final_amount,0)) order_last_1d_final_amount, sum(if(dt>=date_add('2020-06-14',-6),order_count,0)) order_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),order_activity_count,0)) order_activity_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),order_activity_reduce_amount,0)) order_activity_reduce_last_7d_amount, sum(if(dt>=date_add('2020-06-14',-6),order_coupon_count,0)) order_coupon_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),order_coupon_reduce_amount,0)) order_coupon_reduce_last_7d_amount, sum(if(dt>=date_add('2020-06-14',-6),order_original_amount,0)) order_last_7d_original_amount, sum(if(dt>=date_add('2020-06-14',-6),order_final_amount,0)) order_last_7d_final_amount, sum(if(dt>=date_add('2020-06-14',-29),order_count,0)) order_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),order_activity_count,0)) order_activity_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),order_activity_reduce_amount,0)) order_activity_reduce_last_30d_amount, sum(if(dt>=date_add('2020-06-14',-29),order_coupon_count,0)) order_coupon_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),order_coupon_reduce_amount,0)) order_coupon_reduce_last_30d_amount, sum(if(dt>=date_add('2020-06-14',-29),order_original_amount,0)) order_last_30d_original_amount, sum(if(dt>=date_add('2020-06-14',-29),order_final_amount,0)) order_last_30d_final_amount, sum(order_count) order_count, sum(order_activity_count) order_activity_count, sum(order_activity_reduce_amount) order_activity_reduce_amount, sum(order_coupon_count) order_coupon_count, sum(order_coupon_reduce_amount) order_coupon_reduce_amount, sum(order_original_amount) order_original_amount, sum(order_final_amount) order_final_amount, min(if(payment_count>0,dt,null)) payment_date_first, max(if(payment_count>0,dt,null)) payment_date_last, sum(if(dt='2020-06-14',payment_count,0)) payment_last_1d_count, sum(if(dt='2020-06-14',payment_amount,0)) payment_last_1d_amount, sum(if(dt>=date_add('2020-06-14',-6),payment_count,0)) payment_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),payment_amount,0)) payment_last_7d_amount, sum(if(dt>=date_add('2020-06-14',-29),payment_count,0)) payment_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),payment_amount,0)) payment_last_30d_amount, sum(payment_count) payment_count, sum(payment_amount) payment_amount, sum(if(dt='2020-06-14',refund_order_count,0)) refund_order_last_1d_count, sum(if(dt='2020-06-14',refund_order_num,0)) refund_order_last_1d_num, sum(if(dt='2020-06-14',refund_order_amount,0)) refund_order_last_1d_amount, sum(if(dt>=date_add('2020-06-14',-6),refund_order_count,0)) refund_order_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),refund_order_num,0)) refund_order_last_7d_num, sum(if(dt>=date_add('2020-06-14',-6),refund_order_amount,0)) refund_order_last_7d_amount, sum(if(dt>=date_add('2020-06-14',-29),refund_order_count,0)) refund_order_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),refund_order_num,0)) refund_order_last_30d_num, sum(if(dt>=date_add('2020-06-14',-29),refund_order_amount,0)) refund_order_last_30d_amount, sum(refund_order_count) refund_order_count, sum(refund_order_num) refund_order_num, sum(refund_order_amount) refund_order_amount, sum(if(dt='2020-06-14',refund_payment_count,0)) refund_payment_last_1d_count, sum(if(dt='2020-06-14',refund_payment_num,0)) refund_payment_last_1d_num, sum(if(dt='2020-06-14',refund_payment_amount,0)) refund_payment_last_1d_amount, sum(if(dt>=date_add('2020-06-14',-6),refund_payment_count,0)) refund_payment_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),refund_payment_num,0)) refund_payment_last_7d_num, sum(if(dt>=date_add('2020-06-14',-6),refund_payment_amount,0)) refund_payment_last_7d_amount, sum(if(dt>=date_add('2020-06-14',-29),refund_payment_count,0)) refund_payment_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),refund_payment_num,0)) refund_payment_last_30d_num, sum(if(dt>=date_add('2020-06-14',-29),refund_payment_amount,0)) refund_payment_last_30d_amount, sum(refund_payment_count) refund_payment_count, sum(refund_payment_num) refund_payment_num, sum(refund_payment_amount) refund_payment_amount, sum(if(dt='2020-06-14',cart_count,0)) cart_last_1d_count, sum(if(dt>=date_add('2020-06-14',-6),cart_count,0)) cart_last_7d_count, sum(if(dt>=date_add('2020-06-14',-29),cart_count,0)) cart_last_30d_count, sum(cart_count) cart_count, sum(if(dt='2020-06-14',favor_count,0)) favor_last_1d_count, sum(if(dt>=date_add('2020-06-14',-6),favor_count,0)) favor_last_7d_count, sum(if(dt>=date_add('2020-06-14',-29),favor_count,0)) favor_last_30d_count, sum(favor_count) favor_count, sum(if(dt='2020-06-14',coupon_get_count,0)) coupon_last_1d_get_count, sum(if(dt='2020-06-14',coupon_using_count,0)) coupon_last_1d_using_count, sum(if(dt='2020-06-14',coupon_used_count,0)) coupon_last_1d_used_count, sum(if(dt>=date_add('2020-06-14',-6),coupon_get_count,0)) coupon_last_7d_get_count, sum(if(dt>=date_add('2020-06-14',-6),coupon_using_count,0)) coupon_last_7d_using_count, sum(if(dt>=date_add('2020-06-14',-6),coupon_used_count,0)) coupon_last_7d_used_count, sum(if(dt>=date_add('2020-06-14',-29),coupon_get_count,0)) coupon_last_30d_get_count, sum(if(dt>=date_add('2020-06-14',-29),coupon_using_count,0)) coupon_last_30d_using_count, sum(if(dt>=date_add('2020-06-14',-29),coupon_used_count,0)) coupon_last_30d_used_count, sum(coupon_get_count) coupon_get_count, sum(coupon_using_count) coupon_using_count, sum(coupon_used_count) coupon_used_count, sum(if(dt='2020-06-14',appraise_good_count,0)) appraise_last_1d_good_count, sum(if(dt='2020-06-14',appraise_mid_count,0)) appraise_last_1d_mid_count, sum(if(dt='2020-06-14',appraise_bad_count,0)) appraise_last_1d_bad_count, sum(if(dt='2020-06-14',appraise_default_count,0)) appraise_last_1d_default_count, sum(if(dt>=date_add('2020-06-14',-6),appraise_good_count,0)) appraise_last_7d_good_count, sum(if(dt>=date_add('2020-06-14',-6),appraise_mid_count,0)) appraise_last_7d_mid_count, sum(if(dt>=date_add('2020-06-14',-6),appraise_bad_count,0)) appraise_last_7d_bad_count, sum(if(dt>=date_add('2020-06-14',-6),appraise_default_count,0)) appraise_last_7d_default_count, sum(if(dt>=date_add('2020-06-14',-29),appraise_good_count,0)) appraise_last_30d_good_count, sum(if(dt>=date_add('2020-06-14',-29),appraise_mid_count,0)) appraise_last_30d_mid_count, sum(if(dt>=date_add('2020-06-14',-29),appraise_bad_count,0)) appraise_last_30d_bad_count, sum(if(dt>=date_add('2020-06-14',-29),appraise_default_count,0)) appraise_last_30d_default_count, sum(appraise_good_count) appraise_good_count, sum(appraise_mid_count) appraise_mid_count, sum(appraise_bad_count) appraise_bad_count, sum(appraise_default_count) appraise_default_count from dws_user_action_daycount group by user_id )t2 on t1.id=t2.user_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238每日装载
insert overwrite table dwt_user_topic partition(dt='2020-06-15') select nvl(1d_ago.user_id,old.user_id), nvl(old.login_date_first,'2020-06-15'), if(1d_ago.user_id is not null,'2020-06-15',old.login_date_last), nvl(1d_ago.login_count,0), if(1d_ago.user_id is not null,1,0), nvl(old.login_last_7d_count,0)+nvl(1d_ago.login_count,0)- nvl(7d_ago.login_count,0), nvl(old.login_last_7d_day_count,0)+if(1d_ago.user_id is null,0,1)- if(7d_ago.user_id is null,0,1), nvl(old.login_last_30d_count,0)+nvl(1d_ago.login_count,0)- nvl(30d_ago.login_count,0), nvl(old.login_last_30d_day_count,0)+if(1d_ago.user_id is null,0,1)- if(30d_ago.user_id is null,0,1), nvl(old.login_count,0)+nvl(1d_ago.login_count,0), nvl(old.login_day_count,0)+if(1d_ago.user_id is not null,1,0), if(old.order_date_first is null and 1d_ago.order_count>0, '2020-06-15', old.order_date_first), if(1d_ago.order_count>0,'2020-06-15',old.order_date_last), nvl(1d_ago.order_count,0), nvl(1d_ago.order_activity_count,0), nvl(1d_ago.order_activity_reduce_amount,0.0), nvl(1d_ago.order_coupon_count,0), nvl(1d_ago.order_coupon_reduce_amount,0.0), nvl(1d_ago.order_original_amount,0.0), nvl(1d_ago.order_final_amount,0.0), nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0), nvl(old.order_activity_last_7d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(7d_ago.order_activity_count,0), nvl(old.order_activity_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(7d_ago.order_activity_reduce_amount,0.0), nvl(old.order_coupon_last_7d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(7d_ago.order_coupon_count,0), nvl(old.order_coupon_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(7d_ago.order_coupon_reduce_amount,0.0), nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0), nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0), nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0), nvl(old.order_activity_last_30d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(30d_ago.order_activity_count,0), nvl(old.order_activity_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(30d_ago.order_activity_reduce_amount,0.0), nvl(old.order_coupon_last_30d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(30d_ago.order_coupon_count,0), nvl(old.order_coupon_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(30d_ago.order_coupon_reduce_amount,0.0), nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0), nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0), nvl(old.order_count,0)+nvl(1d_ago.order_count,0), nvl(old.order_activity_count,0)+nvl(1d_ago.order_activity_count,0), nvl(old.order_activity_reduce_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0), nvl(old.order_coupon_count,0)+nvl(1d_ago.order_coupon_count,0), nvl(old.order_coupon_reduce_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0), nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0), nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0), if(old.payment_date_first is null and 1d_ago.payment_count>0, '2020-06-15', old.payment_date_first), if(1d_ago.payment_count>0,'2020-06-15',old.payment_date_last), nvl(1d_ago.payment_count,0), nvl(1d_ago.payment_amount,0.0), nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)-nvl(7d_ago.payment_count,0), nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)-nvl(7d_ago.payment_amount,0.0), nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)-nvl(30d_ago.payment_count,0), nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0), nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0), nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0), nvl(1d_ago.refund_order_count,0), nvl(1d_ago.refund_order_num,0), nvl(1d_ago.refund_order_amount,0.0), nvl(old.refund_order_last_7d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(7d_ago.refund_order_count,0), nvl(old.refund_order_last_7d_num,0)+nvl(1d_ago.refund_order_num, 0)- nvl(7d_ago.refund_order_num,0), nvl(old.refund_order_last_7d_amount,0.0)+ nvl(1d_ago.refund_order_amount,0.0)- nvl(7d_ago.refund_order_amount,0.0), nvl(old.refund_order_last_30d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(30d_ago.refund_order_count,0), nvl(old.refund_order_last_30d_num,0)+nvl(1d_ago.refund_order_num, 0)- nvl(30d_ago.refund_order_num,0), nvl(old.refund_order_last_30d_amount,0.0)+ nvl(1d_ago.refund_order_amount,0.0)- nvl(30d_ago.refund_order_amount,0.0), nvl(old.refund_order_count,0)+nvl(1d_ago.refund_order_count,0), nvl(old.refund_order_num,0)+nvl(1d_ago.refund_order_num,0), nvl(old.refund_order_amount,0.0)+ nvl(1d_ago.refund_order_amount,0.0), nvl(1d_ago.refund_payment_count,0), nvl(1d_ago.refund_payment_num,0), nvl(1d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_last_7d_count,0)+nvl(1d_ago.refund_payment_count,0)-nvl(7d_ago.refund_payment_count,0), nvl(old.refund_payment_last_7d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(7d_ago.refund_payment_num,0), nvl(old.refund_payment_last_7d_amount,0.0)+ nvl(1d_ago.refund_payment_amount,0.0)- nvl(7d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_last_30d_count,0)+nvl(1d_ago.refund_payment_count,0)-nvl(30d_ago.refund_payment_count,0), nvl(old.refund_payment_last_30d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(30d_ago.refund_payment_num,0), nvl(old.refund_payment_last_30d_amount,0.0)+ nvl(1d_ago.refund_payment_amount,0.0)- nvl(30d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_count,0)+nvl(1d_ago.refund_payment_count,0), nvl(old.refund_payment_num,0)+nvl(1d_ago.refund_payment_num,0), nvl(old.refund_payment_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0), nvl(1d_ago.cart_count,0), nvl(old.cart_last_7d_count,0)+nvl(1d_ago.cart_count,0)-nvl(7d_ago.cart_count,0), nvl(old.cart_last_30d_count,0)+nvl(1d_ago.cart_count,0)-nvl(30d_ago.cart_count,0), nvl(old.cart_count,0)+nvl(1d_ago.cart_count,0), nvl(1d_ago.favor_count,0), nvl(old.favor_last_7d_count,0)+nvl(1d_ago.favor_count,0)- nvl(7d_ago.favor_count,0), nvl(old.favor_last_30d_count,0)+nvl(1d_ago.favor_count,0)- nvl(30d_ago.favor_count,0), nvl(old.favor_count,0)+nvl(1d_ago.favor_count,0), nvl(1d_ago.coupon_get_count,0), nvl(1d_ago.coupon_using_count,0), nvl(1d_ago.coupon_used_count,0), nvl(old.coupon_last_7d_get_count,0)+nvl(1d_ago.coupon_get_count,0)- nvl(7d_ago.coupon_get_count,0), nvl(old.coupon_last_7d_using_count,0)+nvl(1d_ago.coupon_using_count,0)- nvl(7d_ago.coupon_using_count,0), nvl(old.coupon_last_7d_used_count,0)+ nvl(1d_ago.coupon_used_count,0)- nvl(7d_ago.coupon_used_count,0), nvl(old.coupon_last_30d_get_count,0)+nvl(1d_ago.coupon_get_count,0)- nvl(30d_ago.coupon_get_count,0), nvl(old.coupon_last_30d_using_count,0)+nvl(1d_ago.coupon_using_count,0)- nvl(30d_ago.coupon_using_count,0), nvl(old.coupon_last_30d_used_count,0)+ nvl(1d_ago.coupon_used_count,0)- nvl(30d_ago.coupon_used_count,0), nvl(old.coupon_get_count,0)+nvl(1d_ago.coupon_get_count,0), nvl(old.coupon_using_count,0)+nvl(1d_ago.coupon_using_count,0), nvl(old.coupon_used_count,0)+nvl(1d_ago.coupon_used_count,0), nvl(1d_ago.appraise_good_count,0), nvl(1d_ago.appraise_mid_count,0), nvl(1d_ago.appraise_bad_count,0), nvl(1d_ago.appraise_default_count,0), nvl(old.appraise_last_7d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(7d_ago.appraise_good_count,0), nvl(old.appraise_last_7d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)-nvl(7d_ago.appraise_mid_count,0), nvl(old.appraise_last_7d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)-nvl(7d_ago.appraise_bad_count,0), nvl(old.appraise_last_7d_default_count,0)+nvl(1d_ago.appraise_default_count,0)-nvl(7d_ago.appraise_default_count,0), nvl(old.appraise_last_30d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(30d_ago.appraise_good_count,0), nvl(old.appraise_last_30d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)-nvl(30d_ago.appraise_mid_count,0), nvl(old.appraise_last_30d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)-nvl(30d_ago.appraise_bad_count,0), nvl(old.appraise_last_30d_default_count,0)+nvl(1d_ago.appraise_default_count,0)-nvl(30d_ago.appraise_default_count,0), nvl(old.appraise_good_count,0)+nvl(1d_ago.appraise_good_count,0), nvl(old.appraise_mid_count,0)+nvl(1d_ago.appraise_mid_count, 0), nvl(old.appraise_bad_count,0)+nvl(1d_ago.appraise_bad_count,0), nvl(old.appraise_default_count,0)+nvl(1d_ago.appraise_default_count,0) from ( select user_id, login_date_first, login_date_last, login_date_1d_count, login_last_1d_day_count, login_last_7d_count, login_last_7d_day_count, login_last_30d_count, login_last_30d_day_count, login_count, login_day_count, order_date_first, order_date_last, order_last_1d_count, order_activity_last_1d_count, order_activity_reduce_last_1d_amount, order_coupon_last_1d_count, order_coupon_reduce_last_1d_amount, order_last_1d_original_amount, order_last_1d_final_amount, order_last_7d_count, order_activity_last_7d_count, order_activity_reduce_last_7d_amount, order_coupon_last_7d_count, order_coupon_reduce_last_7d_amount, order_last_7d_original_amount, order_last_7d_final_amount, order_last_30d_count, order_activity_last_30d_count, order_activity_reduce_last_30d_amount, order_coupon_last_30d_count, order_coupon_reduce_last_30d_amount, order_last_30d_original_amount, order_last_30d_final_amount, order_count, order_activity_count, order_activity_reduce_amount, order_coupon_count, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_date_first, payment_date_last, payment_last_1d_count, payment_last_1d_amount, payment_last_7d_count, payment_last_7d_amount, payment_last_30d_count, payment_last_30d_amount, payment_count, payment_amount, refund_order_last_1d_count, refund_order_last_1d_num, refund_order_last_1d_amount, refund_order_last_7d_count, refund_order_last_7d_num, refund_order_last_7d_amount, refund_order_last_30d_count, refund_order_last_30d_num, refund_order_last_30d_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_last_1d_count, refund_payment_last_1d_num, refund_payment_last_1d_amount, refund_payment_last_7d_count, refund_payment_last_7d_num, refund_payment_last_7d_amount, refund_payment_last_30d_count, refund_payment_last_30d_num, refund_payment_last_30d_amount, refund_payment_count, refund_payment_num, refund_payment_amount, cart_last_1d_count, cart_last_7d_count, cart_last_30d_count, cart_count, favor_last_1d_count, favor_last_7d_count, favor_last_30d_count, favor_count, coupon_last_1d_get_count, coupon_last_1d_using_count, coupon_last_1d_used_count, coupon_last_7d_get_count, coupon_last_7d_using_count, coupon_last_7d_used_count, coupon_last_30d_get_count, coupon_last_30d_using_count, coupon_last_30d_used_count, coupon_get_count, coupon_using_count, coupon_used_count, appraise_last_1d_good_count, appraise_last_1d_mid_count, appraise_last_1d_bad_count, appraise_last_1d_default_count, appraise_last_7d_good_count, appraise_last_7d_mid_count, appraise_last_7d_bad_count, appraise_last_7d_default_count, appraise_last_30d_good_count, appraise_last_30d_mid_count, appraise_last_30d_bad_count, appraise_last_30d_default_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from dwt_user_topic where dt=date_add('2020-06-15',-1) )old full outer join ( select user_id, login_count, cart_count, favor_count, order_count, order_activity_count, order_activity_reduce_amount, order_coupon_count, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, coupon_get_count, coupon_using_count, coupon_used_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from dws_user_action_daycount where dt='2020-06-15' )1d_ago on old.user_id=1d_ago.user_id left join ( select user_id, login_count, cart_count, favor_count, order_count, order_activity_count, order_activity_reduce_amount, order_coupon_count, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, coupon_get_count, coupon_using_count, coupon_used_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from dws_user_action_daycount where dt=date_add('2020-06-15',-7) )7d_ago on old.user_id=7d_ago.user_id left join ( select user_id, login_count, cart_count, favor_count, order_count, order_activity_count, order_activity_reduce_amount, order_coupon_count, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, coupon_get_count, coupon_using_count, coupon_used_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from dws_user_action_daycount where dt=date_add('2020-06-15',-30) )30d_ago on old.user_id=30d_ago.user_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
# 商品主题
建表语句
DROP TABLE IF EXISTS dwt_sku_topic; CREATE EXTERNAL TABLE dwt_sku_topic ( `sku_id` STRING COMMENT 'sku_id', `order_last_1d_count` BIGINT COMMENT '最近1日被下单次数', `order_last_1d_num` BIGINT COMMENT '最近1日被下单件数', `order_activity_last_1d_count` BIGINT COMMENT '最近1日参与活动被下单次数', `order_coupon_last_1d_count` BIGINT COMMENT '最近1日使用优惠券被下单次数', `order_activity_reduce_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日优惠金额(活动)', `order_coupon_reduce_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日优惠金额(优惠券)', `order_last_1d_original_amount` DECIMAL(16,2) COMMENT '最近1日被下单原始金额', `order_last_1d_final_amount` DECIMAL(16,2) COMMENT '最近1日被下单最终金额', `order_last_7d_count` BIGINT COMMENT '最近7日被下单次数', `order_last_7d_num` BIGINT COMMENT '最近7日被下单件数', `order_activity_last_7d_count` BIGINT COMMENT '最近7日参与活动被下单次数', `order_coupon_last_7d_count` BIGINT COMMENT '最近7日使用优惠券被下单次数', `order_activity_reduce_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日优惠金额(活动)', `order_coupon_reduce_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日优惠金额(优惠券)', `order_last_7d_original_amount` DECIMAL(16,2) COMMENT '最近7日被下单原始金额', `order_last_7d_final_amount` DECIMAL(16,2) COMMENT '最近7日被下单最终金额', `order_last_30d_count` BIGINT COMMENT '最近30日被下单次数', `order_last_30d_num` BIGINT COMMENT '最近30日被下单件数', `order_activity_last_30d_count` BIGINT COMMENT '最近30日参与活动被下单次数', `order_coupon_last_30d_count` BIGINT COMMENT '最近30日使用优惠券被下单次数', `order_activity_reduce_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日优惠金额(活动)', `order_coupon_reduce_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日优惠金额(优惠券)', `order_last_30d_original_amount` DECIMAL(16,2) COMMENT '最近30日被下单原始金额', `order_last_30d_final_amount` DECIMAL(16,2) COMMENT '最近30日被下单最终金额', `order_count` BIGINT COMMENT '累积被下单次数', `order_num` BIGINT COMMENT '累积被下单件数', `order_activity_count` BIGINT COMMENT '累积参与活动被下单次数', `order_coupon_count` BIGINT COMMENT '累积使用优惠券被下单次数', `order_activity_reduce_amount` DECIMAL(16,2) COMMENT '累积优惠金额(活动)', `order_coupon_reduce_amount` DECIMAL(16,2) COMMENT '累积优惠金额(优惠券)', `order_original_amount` DECIMAL(16,2) COMMENT '累积被下单原始金额', `order_final_amount` DECIMAL(16,2) COMMENT '累积被下单最终金额', `payment_last_1d_count` BIGINT COMMENT '最近1日被支付次数', `payment_last_1d_num` BIGINT COMMENT '最近1日被支付件数', `payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日被支付金额', `payment_last_7d_count` BIGINT COMMENT '最近7日被支付次数', `payment_last_7d_num` BIGINT COMMENT '最近7日被支付件数', `payment_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日被支付金额', `payment_last_30d_count` BIGINT COMMENT '最近30日被支付次数', `payment_last_30d_num` BIGINT COMMENT '最近30日被支付件数', `payment_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日被支付金额', `payment_count` BIGINT COMMENT '累积被支付次数', `payment_num` BIGINT COMMENT '累积被支付件数', `payment_amount` DECIMAL(16,2) COMMENT '累积被支付金额', `refund_order_last_1d_count` BIGINT COMMENT '最近1日退单次数', `refund_order_last_1d_num` BIGINT COMMENT '最近1日退单件数', `refund_order_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日退单金额', `refund_order_last_7d_count` BIGINT COMMENT '最近7日退单次数', `refund_order_last_7d_num` BIGINT COMMENT '最近7日退单件数', `refund_order_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日退单金额', `refund_order_last_30d_count` BIGINT COMMENT '最近30日退单次数', `refund_order_last_30d_num` BIGINT COMMENT '最近30日退单件数', `refund_order_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日退单金额', `refund_order_count` BIGINT COMMENT '累积退单次数', `refund_order_num` BIGINT COMMENT '累积退单件数', `refund_order_amount` DECIMAL(16,2) COMMENT '累积退单金额', `refund_payment_last_1d_count` BIGINT COMMENT '最近1日退款次数', `refund_payment_last_1d_num` BIGINT COMMENT '最近1日退款件数', `refund_payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日退款金额', `refund_payment_last_7d_count` BIGINT COMMENT '最近7日退款次数', `refund_payment_last_7d_num` BIGINT COMMENT '最近7日退款件数', `refund_payment_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日退款金额', `refund_payment_last_30d_count` BIGINT COMMENT '最近30日退款次数', `refund_payment_last_30d_num` BIGINT COMMENT '最近30日退款件数', `refund_payment_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日退款金额', `refund_payment_count` BIGINT COMMENT '累积退款次数', `refund_payment_num` BIGINT COMMENT '累积退款件数', `refund_payment_amount` DECIMAL(16,2) COMMENT '累积退款金额', `cart_last_1d_count` BIGINT COMMENT '最近1日被加入购物车次数', `cart_last_7d_count` BIGINT COMMENT '最近7日被加入购物车次数', `cart_last_30d_count` BIGINT COMMENT '最近30日被加入购物车次数', `cart_count` BIGINT COMMENT '累积被加入购物车次数', `favor_last_1d_count` BIGINT COMMENT '最近1日被收藏次数', `favor_last_7d_count` BIGINT COMMENT '最近7日被收藏次数', `favor_last_30d_count` BIGINT COMMENT '最近30日被收藏次数', `favor_count` BIGINT COMMENT '累积被收藏次数', `appraise_last_1d_good_count` BIGINT COMMENT '最近1日好评数', `appraise_last_1d_mid_count` BIGINT COMMENT '最近1日中评数', `appraise_last_1d_bad_count` BIGINT COMMENT '最近1日差评数', `appraise_last_1d_default_count` BIGINT COMMENT '最近1日默认评价数', `appraise_last_7d_good_count` BIGINT COMMENT '最近7日好评数', `appraise_last_7d_mid_count` BIGINT COMMENT '最近7日中评数', `appraise_last_7d_bad_count` BIGINT COMMENT '最近7日差评数', `appraise_last_7d_default_count` BIGINT COMMENT '最近7日默认评价数', `appraise_last_30d_good_count` BIGINT COMMENT '最近30日好评数', `appraise_last_30d_mid_count` BIGINT COMMENT '最近30日中评数', `appraise_last_30d_bad_count` BIGINT COMMENT '最近30日差评数', `appraise_last_30d_default_count` BIGINT COMMENT '最近30日默认评价数', `appraise_good_count` BIGINT COMMENT '累积好评数', `appraise_mid_count` BIGINT COMMENT '累积中评数', `appraise_bad_count` BIGINT COMMENT '累积差评数', `appraise_default_count` BIGINT COMMENT '累积默认评价数' )COMMENT '商品主题宽表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwt/dwt_sku_topic/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101数据装载
首日装载
insert overwrite table dwt_sku_topic partition(dt='2020-06-14') select id, nvl(order_last_1d_count,0), nvl(order_last_1d_num,0), nvl(order_activity_last_1d_count,0), nvl(order_coupon_last_1d_count,0), nvl(order_activity_reduce_last_1d_amount,0), nvl(order_coupon_reduce_last_1d_amount,0), nvl(order_last_1d_original_amount,0), nvl(order_last_1d_final_amount,0), nvl(order_last_7d_count,0), nvl(order_last_7d_num,0), nvl(order_activity_last_7d_count,0), nvl(order_coupon_last_7d_count,0), nvl(order_activity_reduce_last_7d_amount,0), nvl(order_coupon_reduce_last_7d_amount,0), nvl(order_last_7d_original_amount,0), nvl(order_last_7d_final_amount,0), nvl(order_last_30d_count,0), nvl(order_last_30d_num,0), nvl(order_activity_last_30d_count,0), nvl(order_coupon_last_30d_count,0), nvl(order_activity_reduce_last_30d_amount,0), nvl(order_coupon_reduce_last_30d_amount,0), nvl(order_last_30d_original_amount,0), nvl(order_last_30d_final_amount,0), nvl(order_count,0), nvl(order_num,0), nvl(order_activity_count,0), nvl(order_coupon_count,0), nvl(order_activity_reduce_amount,0), nvl(order_coupon_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_last_1d_count,0), nvl(payment_last_1d_num,0), nvl(payment_last_1d_amount,0), nvl(payment_last_7d_count,0), nvl(payment_last_7d_num,0), nvl(payment_last_7d_amount,0), nvl(payment_last_30d_count,0), nvl(payment_last_30d_num,0), nvl(payment_last_30d_amount,0), nvl(payment_count,0), nvl(payment_num,0), nvl(payment_amount,0), nvl(refund_order_last_1d_count,0), nvl(refund_order_last_1d_num,0), nvl(refund_order_last_1d_amount,0), nvl(refund_order_last_7d_count,0), nvl(refund_order_last_7d_num,0), nvl(refund_order_last_7d_amount,0), nvl(refund_order_last_30d_count,0), nvl(refund_order_last_30d_num,0), nvl(refund_order_last_30d_amount,0), nvl(refund_order_count,0), nvl(refund_order_num,0), nvl(refund_order_amount,0), nvl(refund_payment_last_1d_count,0), nvl(refund_payment_last_1d_num,0), nvl(refund_payment_last_1d_amount,0), nvl(refund_payment_last_7d_count,0), nvl(refund_payment_last_7d_num,0), nvl(refund_payment_last_7d_amount,0), nvl(refund_payment_last_30d_count,0), nvl(refund_payment_last_30d_num,0), nvl(refund_payment_last_30d_amount,0), nvl(refund_payment_count,0), nvl(refund_payment_num,0), nvl(refund_payment_amount,0), nvl(cart_last_1d_count,0), nvl(cart_last_7d_count,0), nvl(cart_last_30d_count,0), nvl(cart_count,0), nvl(favor_last_1d_count,0), nvl(favor_last_7d_count,0), nvl(favor_last_30d_count,0), nvl(favor_count,0), nvl(appraise_last_1d_good_count,0), nvl(appraise_last_1d_mid_count,0), nvl(appraise_last_1d_bad_count,0), nvl(appraise_last_1d_default_count,0), nvl(appraise_last_7d_good_count,0), nvl(appraise_last_7d_mid_count,0), nvl(appraise_last_7d_bad_count,0), nvl(appraise_last_7d_default_count,0), nvl(appraise_last_30d_good_count,0), nvl(appraise_last_30d_mid_count,0), nvl(appraise_last_30d_bad_count,0), nvl(appraise_last_30d_default_count,0), nvl(appraise_good_count,0), nvl(appraise_mid_count,0), nvl(appraise_bad_count,0), nvl(appraise_default_count,0) from ( select id from dim_sku_info where dt='2020-06-14' )t1 left join ( select sku_id, sum(if(dt='2020-06-14',order_count,0)) order_last_1d_count, sum(if(dt='2020-06-14',order_num,0)) order_last_1d_num, sum(if(dt='2020-06-14',order_activity_count,0)) order_activity_last_1d_count, sum(if(dt='2020-06-14',order_coupon_count,0)) order_coupon_last_1d_count, sum(if(dt='2020-06-14',order_activity_reduce_amount,0)) order_activity_reduce_last_1d_amount, sum(if(dt='2020-06-14',order_coupon_reduce_amount,0)) order_coupon_reduce_last_1d_amount, sum(if(dt='2020-06-14',order_original_amount,0)) order_last_1d_original_amount, sum(if(dt='2020-06-14',order_final_amount,0)) order_last_1d_final_amount, sum(if(dt>=date_add('2020-06-14',-6),order_count,0)) order_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),order_num,0)) order_last_7d_num, sum(if(dt>=date_add('2020-06-14',-6),order_activity_count,0)) order_activity_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),order_coupon_count,0)) order_coupon_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),order_activity_reduce_amount,0)) order_activity_reduce_last_7d_amount, sum(if(dt>=date_add('2020-06-14',-6),order_coupon_reduce_amount,0)) order_coupon_reduce_last_7d_amount, sum(if(dt>=date_add('2020-06-14',-6),order_original_amount,0)) order_last_7d_original_amount, sum(if(dt>=date_add('2020-06-14',-6),order_final_amount,0)) order_last_7d_final_amount, sum(if(dt>=date_add('2020-06-14',-29),order_count,0)) order_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),order_num,0)) order_last_30d_num, sum(if(dt>=date_add('2020-06-14',-29),order_activity_count,0)) order_activity_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),order_coupon_count,0)) order_coupon_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),order_activity_reduce_amount,0)) order_activity_reduce_last_30d_amount, sum(if(dt>=date_add('2020-06-14',-29),order_coupon_reduce_amount,0)) order_coupon_reduce_last_30d_amount, sum(if(dt>=date_add('2020-06-14',-29),order_original_amount,0)) order_last_30d_original_amount, sum(if(dt>=date_add('2020-06-14',-29),order_final_amount,0)) order_last_30d_final_amount, sum(order_count) order_count, sum(order_num) order_num, sum(order_activity_count) order_activity_count, sum(order_coupon_count) order_coupon_count, sum(order_activity_reduce_amount) order_activity_reduce_amount, sum(order_coupon_reduce_amount) order_coupon_reduce_amount, sum(order_original_amount) order_original_amount, sum(order_final_amount) order_final_amount, sum(if(dt='2020-06-14',payment_count,0)) payment_last_1d_count, sum(if(dt='2020-06-14',payment_num,0)) payment_last_1d_num, sum(if(dt='2020-06-14',payment_amount,0)) payment_last_1d_amount, sum(if(dt>=date_add('2020-06-14',-6),payment_count,0)) payment_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),payment_num,0)) payment_last_7d_num, sum(if(dt>=date_add('2020-06-14',-6),payment_amount,0)) payment_last_7d_amount, sum(if(dt>=date_add('2020-06-14',-29),payment_count,0)) payment_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),payment_num,0)) payment_last_30d_num, sum(if(dt>=date_add('2020-06-14',-29),payment_amount,0)) payment_last_30d_amount, sum(payment_count) payment_count, sum(payment_num) payment_num, sum(payment_amount) payment_amount, sum(if(dt='2020-06-14',refund_order_count,0)) refund_order_last_1d_count, sum(if(dt='2020-06-14',refund_order_num,0)) refund_order_last_1d_num, sum(if(dt='2020-06-14',refund_order_amount,0)) refund_order_last_1d_amount, sum(if(dt>=date_add('2020-06-14',-6),refund_order_count,0)) refund_order_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),refund_order_num,0)) refund_order_last_7d_num, sum(if(dt>=date_add('2020-06-14',-6),refund_order_amount,0)) refund_order_last_7d_amount, sum(if(dt>=date_add('2020-06-14',-29),refund_order_count,0)) refund_order_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),refund_order_num,0)) refund_order_last_30d_num, sum(if(dt>=date_add('2020-06-14',-29),refund_order_amount,0)) refund_order_last_30d_amount, sum(refund_order_count) refund_order_count, sum(refund_order_num) refund_order_num, sum(refund_order_amount) refund_order_amount, sum(if(dt='2020-06-14',refund_payment_count,0)) refund_payment_last_1d_count, sum(if(dt='2020-06-14',refund_payment_num,0)) refund_payment_last_1d_num, sum(if(dt='2020-06-14',refund_payment_amount,0)) refund_payment_last_1d_amount, sum(if(dt>=date_add('2020-06-14',-6),refund_payment_count,0)) refund_payment_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),refund_payment_num,0)) refund_payment_last_7d_num, sum(if(dt>=date_add('2020-06-14',-6),refund_payment_amount,0)) refund_payment_last_7d_amount, sum(if(dt>=date_add('2020-06-14',-29),refund_payment_count,0)) refund_payment_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),refund_payment_num,0)) refund_payment_last_30d_num, sum(if(dt>=date_add('2020-06-14',-29),refund_payment_amount,0)) refund_payment_last_30d_amount, sum(refund_payment_count) refund_payment_count, sum(refund_payment_num) refund_payment_num, sum(refund_payment_amount) refund_payment_amount, sum(if(dt='2020-06-14',cart_count,0)) cart_last_1d_count, sum(if(dt>=date_add('2020-06-14',-6),cart_count,0)) cart_last_7d_count, sum(if(dt>=date_add('2020-06-14',-29),cart_count,0)) cart_last_30d_count, sum(cart_count) cart_count, sum(if(dt='2020-06-14',favor_count,0)) favor_last_1d_count, sum(if(dt>=date_add('2020-06-14',-6),favor_count,0)) favor_last_7d_count, sum(if(dt>=date_add('2020-06-14',-29),favor_count,0)) favor_last_30d_count, sum(favor_count) favor_count, sum(if(dt='2020-06-14',appraise_good_count,0)) appraise_last_1d_good_count, sum(if(dt='2020-06-14',appraise_mid_count,0)) appraise_last_1d_mid_count, sum(if(dt='2020-06-14',appraise_bad_count,0)) appraise_last_1d_bad_count, sum(if(dt='2020-06-14',appraise_default_count,0)) appraise_last_1d_default_count, sum(if(dt>=date_add('2020-06-14',-6),appraise_good_count,0)) appraise_last_7d_good_count, sum(if(dt>=date_add('2020-06-14',-6),appraise_mid_count,0)) appraise_last_7d_mid_count, sum(if(dt>=date_add('2020-06-14',-6),appraise_bad_count,0)) appraise_last_7d_bad_count, sum(if(dt>=date_add('2020-06-14',-6),appraise_default_count,0)) appraise_last_7d_default_count, sum(if(dt>=date_add('2020-06-14',-29),appraise_good_count,0)) appraise_last_30d_good_count, sum(if(dt>=date_add('2020-06-14',-29),appraise_mid_count,0)) appraise_last_30d_mid_count, sum(if(dt>=date_add('2020-06-14',-29),appraise_bad_count,0)) appraise_last_30d_bad_count, sum(if(dt>=date_add('2020-06-14',-29),appraise_default_count,0)) appraise_last_30d_default_count, sum(appraise_good_count) appraise_good_count, sum(appraise_mid_count) appraise_mid_count, sum(appraise_bad_count) appraise_bad_count, sum(appraise_default_count) appraise_default_count from dws_sku_action_daycount group by sku_id )t2 on t1.id=t2.sku_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202每日装载
insert overwrite table dwt_sku_topic partition(dt='2020-06-15') select nvl(1d_ago.sku_id,old.sku_id), nvl(1d_ago.order_count,0), nvl(1d_ago.order_num,0), nvl(1d_ago.order_activity_count,0), nvl(1d_ago.order_coupon_count,0), nvl(1d_ago.order_activity_reduce_amount,0.0), nvl(1d_ago.order_coupon_reduce_amount,0.0), nvl(1d_ago.order_original_amount,0.0), nvl(1d_ago.order_final_amount,0.0), nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0), nvl(old.order_last_7d_num,0)+nvl(1d_ago.order_num,0)- nvl(7d_ago.order_num,0), nvl(old.order_activity_last_7d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(7d_ago.order_activity_count,0), nvl(old.order_coupon_last_7d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(7d_ago.order_coupon_count,0), nvl(old.order_activity_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(7d_ago.order_activity_reduce_amount,0.0), nvl(old.order_coupon_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(7d_ago.order_coupon_reduce_amount,0.0), nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0), nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0), nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0), nvl(old.order_last_30d_num,0)+nvl(1d_ago.order_num,0)- nvl(30d_ago.order_num,0), nvl(old.order_activity_last_30d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(30d_ago.order_activity_count,0), nvl(old.order_coupon_last_30d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(30d_ago.order_coupon_count,0), nvl(old.order_activity_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(30d_ago.order_activity_reduce_amount,0.0), nvl(old.order_coupon_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(30d_ago.order_coupon_reduce_amount,0.0), nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0), nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0), nvl(old.order_count,0)+nvl(1d_ago.order_count,0), nvl(old.order_num,0)+nvl(1d_ago.order_num,0), nvl(old.order_activity_count,0)+nvl(1d_ago.order_activity_count,0), nvl(old.order_coupon_count,0)+nvl(1d_ago.order_coupon_count,0), nvl(old.order_activity_reduce_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0), nvl(old.order_coupon_reduce_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0), nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0), nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0), nvl(1d_ago.payment_count,0), nvl(1d_ago.payment_num,0), nvl(1d_ago.payment_amount,0.0), nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)- nvl(7d_ago.payment_count,0), nvl(old.payment_last_7d_num,0)+nvl(1d_ago.payment_num,0)- nvl(7d_ago.payment_num,0), nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(7d_ago.payment_amount,0.0), nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)- nvl(30d_ago.payment_count,0), nvl(old.payment_last_30d_num,0)+nvl(1d_ago.payment_num,0)- nvl(30d_ago.payment_num,0), nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0), nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0), nvl(old.payment_num,0)+nvl(1d_ago.payment_num,0), nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0), nvl(old.refund_order_last_1d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(1d_ago.refund_order_count,0), nvl(old.refund_order_last_1d_num,0)+nvl(1d_ago.refund_order_num,0)- nvl(1d_ago.refund_order_num,0), nvl(old.refund_order_last_1d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(1d_ago.refund_order_amount,0.0), nvl(old.refund_order_last_7d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(7d_ago.refund_order_count,0), nvl(old.refund_order_last_7d_num,0)+nvl(1d_ago.refund_order_num,0)- nvl(7d_ago.refund_order_num,0), nvl(old.refund_order_last_7d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(7d_ago.refund_order_amount,0.0), nvl(old.refund_order_last_30d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(30d_ago.refund_order_count,0), nvl(old.refund_order_last_30d_num,0)+nvl(1d_ago.refund_order_num,0)- nvl(30d_ago.refund_order_num,0), nvl(old.refund_order_last_30d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(30d_ago.refund_order_amount,0.0), nvl(old.refund_order_count,0)+nvl(1d_ago.refund_order_count,0), nvl(old.refund_order_num,0)+nvl(1d_ago.refund_order_num,0), nvl(old.refund_order_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0), nvl(1d_ago.refund_payment_count,0), nvl(1d_ago.refund_payment_num,0), nvl(1d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_last_7d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(7d_ago.refund_payment_count,0), nvl(old.refund_payment_last_7d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(7d_ago.refund_payment_num,0), nvl(old.refund_payment_last_7d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(7d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_last_30d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(30d_ago.refund_payment_count,0), nvl(old.refund_payment_last_30d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(30d_ago.refund_payment_num,0), nvl(old.refund_payment_last_30d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(30d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_count,0)+nvl(1d_ago.refund_payment_count,0), nvl(old.refund_payment_num,0)+nvl(1d_ago.refund_payment_num,0), nvl(old.refund_payment_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0), nvl(1d_ago.cart_count,0), nvl(old.cart_last_7d_count,0)+nvl(1d_ago.cart_count,0)- nvl(7d_ago.cart_count,0), nvl(old.cart_last_30d_count,0)+nvl(1d_ago.cart_count,0)- nvl(30d_ago.cart_count,0), nvl(old.cart_count,0)+nvl(1d_ago.cart_count,0), nvl(1d_ago.favor_count,0), nvl(old.favor_last_7d_count,0)+nvl(1d_ago.favor_count,0)- nvl(7d_ago.favor_count,0), nvl(old.favor_last_30d_count,0)+nvl(1d_ago.favor_count,0)- nvl(30d_ago.favor_count,0), nvl(old.favor_count,0)+nvl(1d_ago.favor_count,0), nvl(1d_ago.appraise_good_count,0), nvl(1d_ago.appraise_mid_count,0), nvl(1d_ago.appraise_bad_count,0), nvl(1d_ago.appraise_default_count,0), nvl(old.appraise_last_7d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(7d_ago.appraise_good_count,0), nvl(old.appraise_last_7d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)- nvl(7d_ago.appraise_mid_count,0), nvl(old.appraise_last_7d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)- nvl(7d_ago.appraise_bad_count,0), nvl(old.appraise_last_7d_default_count,0)+nvl(1d_ago.appraise_default_count,0)- nvl(7d_ago.appraise_default_count,0), nvl(old.appraise_last_30d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(30d_ago.appraise_good_count,0), nvl(old.appraise_last_30d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)- nvl(30d_ago.appraise_mid_count,0), nvl(old.appraise_last_30d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)- nvl(30d_ago.appraise_bad_count,0), nvl(old.appraise_last_30d_default_count,0)+nvl(1d_ago.appraise_default_count,0)- nvl(30d_ago.appraise_default_count,0), nvl(old.appraise_good_count,0)+nvl(1d_ago.appraise_good_count,0), nvl(old.appraise_mid_count,0)+nvl(1d_ago.appraise_mid_count,0), nvl(old.appraise_bad_count,0)+nvl(1d_ago.appraise_bad_count,0), nvl(old.appraise_default_count,0)+nvl(1d_ago.appraise_default_count,0) from ( select sku_id, order_last_1d_count, order_last_1d_num, order_activity_last_1d_count, order_coupon_last_1d_count, order_activity_reduce_last_1d_amount, order_coupon_reduce_last_1d_amount, order_last_1d_original_amount, order_last_1d_final_amount, order_last_7d_count, order_last_7d_num, order_activity_last_7d_count, order_coupon_last_7d_count, order_activity_reduce_last_7d_amount, order_coupon_reduce_last_7d_amount, order_last_7d_original_amount, order_last_7d_final_amount, order_last_30d_count, order_last_30d_num, order_activity_last_30d_count, order_coupon_last_30d_count, order_activity_reduce_last_30d_amount, order_coupon_reduce_last_30d_amount, order_last_30d_original_amount, order_last_30d_final_amount, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_last_1d_count, payment_last_1d_num, payment_last_1d_amount, payment_last_7d_count, payment_last_7d_num, payment_last_7d_amount, payment_last_30d_count, payment_last_30d_num, payment_last_30d_amount, payment_count, payment_num, payment_amount, refund_order_last_1d_count, refund_order_last_1d_num, refund_order_last_1d_amount, refund_order_last_7d_count, refund_order_last_7d_num, refund_order_last_7d_amount, refund_order_last_30d_count, refund_order_last_30d_num, refund_order_last_30d_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_last_1d_count, refund_payment_last_1d_num, refund_payment_last_1d_amount, refund_payment_last_7d_count, refund_payment_last_7d_num, refund_payment_last_7d_amount, refund_payment_last_30d_count, refund_payment_last_30d_num, refund_payment_last_30d_amount, refund_payment_count, refund_payment_num, refund_payment_amount, cart_last_1d_count, cart_last_7d_count, cart_last_30d_count, cart_count, favor_last_1d_count, favor_last_7d_count, favor_last_30d_count, favor_count, appraise_last_1d_good_count, appraise_last_1d_mid_count, appraise_last_1d_bad_count, appraise_last_1d_default_count, appraise_last_7d_good_count, appraise_last_7d_mid_count, appraise_last_7d_bad_count, appraise_last_7d_default_count, appraise_last_30d_good_count, appraise_last_30d_mid_count, appraise_last_30d_bad_count, appraise_last_30d_default_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from dwt_sku_topic where dt=date_add('2020-06-15',-1) )old full outer join ( select sku_id, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_num, payment_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, cart_count, favor_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from dws_sku_action_daycount where dt='2020-06-15' )1d_ago on old.sku_id=1d_ago.sku_id left join ( select sku_id, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_num, payment_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, cart_count, favor_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from dws_sku_action_daycount where dt=date_add('2020-06-15',-7) )7d_ago on old.sku_id=7d_ago.sku_id left join ( select sku_id, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_num, payment_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, cart_count, favor_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from dws_sku_action_daycount where dt=date_add('2020-06-15',-30) )30d_ago on old.sku_id=30d_ago.sku_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
# 优惠券主题
建表语句
DROP TABLE IF EXISTS dwt_coupon_topic; CREATE EXTERNAL TABLE dwt_coupon_topic( `coupon_id` STRING COMMENT '优惠券ID', `get_last_1d_count` BIGINT COMMENT '最近1日领取次数', `get_last_7d_count` BIGINT COMMENT '最近7日领取次数', `get_last_30d_count` BIGINT COMMENT '最近30日领取次数', `get_count` BIGINT COMMENT '累积领取次数', `order_last_1d_count` BIGINT COMMENT '最近1日使用某券下单次数', `order_last_1d_reduce_amount` DECIMAL(16,2) COMMENT '最近1日使用某券下单优惠金额', `order_last_1d_original_amount` DECIMAL(16,2) COMMENT '最近1日使用某券下单原始金额', `order_last_1d_final_amount` DECIMAL(16,2) COMMENT '最近1日使用某券下单最终金额', `order_last_7d_count` BIGINT COMMENT '最近7日使用某券下单次数', `order_last_7d_reduce_amount` DECIMAL(16,2) COMMENT '最近7日使用某券下单优惠金额', `order_last_7d_original_amount` DECIMAL(16,2) COMMENT '最近7日使用某券下单原始金额', `order_last_7d_final_amount` DECIMAL(16,2) COMMENT '最近7日使用某券下单最终金额', `order_last_30d_count` BIGINT COMMENT '最近30日使用某券下单次数', `order_last_30d_reduce_amount` DECIMAL(16,2) COMMENT '最近30日使用某券下单优惠金额', `order_last_30d_original_amount` DECIMAL(16,2) COMMENT '最近30日使用某券下单原始金额', `order_last_30d_final_amount` DECIMAL(16,2) COMMENT '最近30日使用某券下单最终金额', `order_count` BIGINT COMMENT '累积使用(下单)次数', `order_reduce_amount` DECIMAL(16,2) COMMENT '使用某券累积下单优惠金额', `order_original_amount` DECIMAL(16,2) COMMENT '使用某券累积下单原始金额', `order_final_amount` DECIMAL(16,2) COMMENT '使用某券累积下单最终金额', `payment_last_1d_count` BIGINT COMMENT '最近1日使用某券支付次数', `payment_last_1d_reduce_amount` DECIMAL(16,2) COMMENT '最近1日使用某券优惠金额', `payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日使用某券支付金额', `payment_last_7d_count` BIGINT COMMENT '最近7日使用某券支付次数', `payment_last_7d_reduce_amount` DECIMAL(16,2) COMMENT '最近7日使用某券优惠金额', `payment_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日使用某券支付金额', `payment_last_30d_count` BIGINT COMMENT '最近30日使用某券支付次数', `payment_last_30d_reduce_amount` DECIMAL(16,2) COMMENT '最近30日使用某券优惠金额', `payment_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日使用某券支付金额', `payment_count` BIGINT COMMENT '累积使用(支付)次数', `payment_reduce_amount` DECIMAL(16,2) COMMENT '使用某券累积优惠金额', `payment_amount` DECIMAL(16,2) COMMENT '使用某券累积支付金额', `expire_last_1d_count` BIGINT COMMENT '最近1日过期次数', `expire_last_7d_count` BIGINT COMMENT '最近7日过期次数', `expire_last_30d_count` BIGINT COMMENT '最近30日过期次数', `expire_count` BIGINT COMMENT '累积过期次数' )comment '优惠券主题表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwt/dwt_coupon_topic/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44数据装载
首日装载
insert overwrite table dwt_coupon_topic partition(dt='2020-06-14') select id, nvl(get_last_1d_count,0), nvl(get_last_7d_count,0), nvl(get_last_30d_count,0), nvl(get_count,0), nvl(order_last_1d_count,0), nvl(order_last_1d_reduce_amount,0), nvl(order_last_1d_original_amount,0), nvl(order_last_1d_final_amount,0), nvl(order_last_7d_count,0), nvl(order_last_7d_reduce_amount,0), nvl(order_last_7d_original_amount,0), nvl(order_last_7d_final_amount,0), nvl(order_last_30d_count,0), nvl(order_last_30d_reduce_amount,0), nvl(order_last_30d_original_amount,0), nvl(order_last_30d_final_amount,0), nvl(order_count,0), nvl(order_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_last_1d_count,0), nvl(payment_last_1d_reduce_amount,0), nvl(payment_last_1d_amount,0), nvl(payment_last_7d_count,0), nvl(payment_last_7d_reduce_amount,0), nvl(payment_last_7d_amount,0), nvl(payment_last_30d_count,0), nvl(payment_last_30d_reduce_amount,0), nvl(payment_last_30d_amount,0), nvl(payment_count,0), nvl(payment_reduce_amount,0), nvl(payment_amount,0), nvl(expire_last_1d_count,0), nvl(expire_last_7d_count,0), nvl(expire_last_30d_count,0), nvl(expire_count,0) from ( select id from dim_coupon_info where dt='2020-06-14' )t1 left join ( select coupon_id coupon_id, sum(if(dt='2020-06-14',get_count,0)) get_last_1d_count, sum(if(dt>=date_add('2020-06-14',-6),get_count,0)) get_last_7d_count, sum(if(dt>=date_add('2020-06-14',-29),get_count,0)) get_last_30d_count, sum(get_count) get_count, sum(if(dt='2020-06-14',order_count,0)) order_last_1d_count, sum(if(dt='2020-06-14',order_reduce_amount,0)) order_last_1d_reduce_amount, sum(if(dt='2020-06-14',order_original_amount,0)) order_last_1d_original_amount, sum(if(dt='2020-06-14',order_final_amount,0)) order_last_1d_final_amount, sum(if(dt>=date_add('2020-06-14',-6),order_count,0)) order_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),order_reduce_amount,0)) order_last_7d_reduce_amount, sum(if(dt>=date_add('2020-06-14',-6),order_original_amount,0)) order_last_7d_original_amount, sum(if(dt>=date_add('2020-06-14',-6),order_final_amount,0)) order_last_7d_final_amount, sum(if(dt>=date_add('2020-06-14',-29),order_count,0)) order_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),order_reduce_amount,0)) order_last_30d_reduce_amount, sum(if(dt>=date_add('2020-06-14',-29),order_original_amount,0)) order_last_30d_original_amount, sum(if(dt>=date_add('2020-06-14',-29),order_final_amount,0)) order_last_30d_final_amount, sum(order_count) order_count, sum(order_reduce_amount) order_reduce_amount, sum(order_original_amount) order_original_amount, sum(order_final_amount) order_final_amount, sum(if(dt='2020-06-14',payment_count,0)) payment_last_1d_count, sum(if(dt='2020-06-14',payment_reduce_amount,0)) payment_last_1d_reduce_amount, sum(if(dt='2020-06-14',payment_amount,0)) payment_last_1d_amount, sum(if(dt>=date_add('2020-06-14',-6),payment_count,0)) payment_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),payment_reduce_amount,0)) payment_last_7d_reduce_amount, sum(if(dt>=date_add('2020-06-14',-6),payment_amount,0)) payment_last_7d_amount, sum(if(dt>=date_add('2020-06-14',-29),payment_count,0)) payment_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),payment_reduce_amount,0)) payment_last_30d_reduce_amount, sum(if(dt>=date_add('2020-06-14',-29),payment_amount,0)) payment_last_30d_amount, sum(payment_count) payment_count, sum(payment_reduce_amount) payment_reduce_amount, sum(payment_amount) payment_amount, sum(if(dt='2020-06-14',expire_count,0)) expire_last_1d_count, sum(if(dt>=date_add('2020-06-14',-6),expire_count,0)) expire_last_7d_count, sum(if(dt>=date_add('2020-06-14',-29),expire_count,0)) expire_last_30d_count, sum(expire_count) expire_count from dws_coupon_info_daycount group by coupon_id )t2 on t1.id=t2.coupon_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90每日装载
insert overwrite table dwt_coupon_topic partition(dt='2020-06-15') select nvl(1d_ago.coupon_id,old.coupon_id), nvl(1d_ago.get_count,0), nvl(old.get_last_7d_count,0)+nvl(1d_ago.get_count,0)- nvl(7d_ago.get_count,0), nvl(old.get_last_30d_count,0)+nvl(1d_ago.get_count,0)- nvl(30d_ago.get_count,0), nvl(old.get_count,0)+nvl(1d_ago.get_count,0), nvl(1d_ago.order_count,0), nvl(1d_ago.order_reduce_amount,0.0), nvl(1d_ago.order_original_amount,0.0), nvl(1d_ago.order_final_amount,0.0), nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0), nvl(old.order_last_7d_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0)- nvl(7d_ago.order_reduce_amount,0.0), nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0), nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0), nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0), nvl(old.order_last_30d_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0)- nvl(30d_ago.order_reduce_amount,0.0), nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0), nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0), nvl(old.order_count,0)+nvl(1d_ago.order_count,0), nvl(old.order_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0), nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0), nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0), nvl(old.payment_last_1d_count,0)+nvl(1d_ago.payment_count,0)- nvl(1d_ago.payment_count,0), nvl(old.payment_last_1d_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0)- nvl(1d_ago.payment_reduce_amount,0.0), nvl(old.payment_last_1d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(1d_ago.payment_amount,0.0), nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)- nvl(7d_ago.payment_count,0), nvl(old.payment_last_7d_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0)- nvl(7d_ago.payment_reduce_amount,0.0), nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(7d_ago.payment_amount,0.0), nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)- nvl(30d_ago.payment_count,0), nvl(old.payment_last_30d_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0)- nvl(30d_ago.payment_reduce_amount,0.0), nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0), nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0), nvl(old.payment_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0), nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0), nvl(1d_ago.expire_count,0), nvl(old.expire_last_7d_count,0)+nvl(1d_ago.expire_count,0)- nvl(7d_ago.expire_count,0), nvl(old.expire_last_30d_count,0)+nvl(1d_ago.expire_count,0)- nvl(30d_ago.expire_count,0), nvl(old.expire_count,0)+nvl(1d_ago.expire_count,0) from ( select coupon_id, get_last_1d_count, get_last_7d_count, get_last_30d_count, get_count, order_last_1d_count, order_last_1d_reduce_amount, order_last_1d_original_amount, order_last_1d_final_amount, order_last_7d_count, order_last_7d_reduce_amount, order_last_7d_original_amount, order_last_7d_final_amount, order_last_30d_count, order_last_30d_reduce_amount, order_last_30d_original_amount, order_last_30d_final_amount, order_count, order_reduce_amount, order_original_amount, order_final_amount, payment_last_1d_count, payment_last_1d_reduce_amount, payment_last_1d_amount, payment_last_7d_count, payment_last_7d_reduce_amount, payment_last_7d_amount, payment_last_30d_count, payment_last_30d_reduce_amount, payment_last_30d_amount, payment_count, payment_reduce_amount, payment_amount, expire_last_1d_count, expire_last_7d_count, expire_last_30d_count, expire_count from dwt_coupon_topic where dt=date_add('2020-06-15',-1) )old full outer join ( select coupon_id, get_count, order_count, order_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_reduce_amount, payment_amount, expire_count from dws_coupon_info_daycount where dt='2020-06-15' )1d_ago on old.coupon_id=1d_ago.coupon_id left join ( select coupon_id, get_count, order_count, order_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_reduce_amount, payment_amount, expire_count from dws_coupon_info_daycount where dt=date_add('2020-06-15',-7) )7d_ago on old.coupon_id=7d_ago.coupon_id left join ( select coupon_id, get_count, order_count, order_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_reduce_amount, payment_amount, expire_count from dws_coupon_info_daycount where dt=date_add('2020-06-15',-30) )30d_ago on old.coupon_id=30d_ago.coupon_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
# 活动主题
建表语句
DROP TABLE IF EXISTS dwt_activity_topic; CREATE EXTERNAL TABLE dwt_activity_topic( `activity_rule_id` STRING COMMENT '活动规则ID', `activity_id` STRING COMMENT '活动ID', `order_last_1d_count` BIGINT COMMENT '最近1日参与某活动某规则下单次数', `order_last_1d_reduce_amount` DECIMAL(16,2) COMMENT '最近1日参与某活动某规则下单优惠金额', `order_last_1d_original_amount` DECIMAL(16,2) COMMENT '最近1日参与某活动某规则下单原始金额', `order_last_1d_final_amount` DECIMAL(16,2) COMMENT '最近1日参与某活动某规则下单最终金额', `order_count` BIGINT COMMENT '参与某活动某规则累积下单次数', `order_reduce_amount` DECIMAL(16,2) COMMENT '参与某活动某规则累积下单优惠金额', `order_original_amount` DECIMAL(16,2) COMMENT '参与某活动某规则累积下单原始金额', `order_final_amount` DECIMAL(16,2) COMMENT '参与某活动某规则累积下单最终金额', `payment_last_1d_count` BIGINT COMMENT '最近1日参与某活动某规则支付次数', `payment_last_1d_reduce_amount` DECIMAL(16,2) COMMENT '最近1日参与某活动某规则支付优惠金额', `payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日参与某活动某规则支付金额', `payment_count` BIGINT COMMENT '参与某活动某规则累积支付次数', `payment_reduce_amount` DECIMAL(16,2) COMMENT '参与某活动某规则累积支付优惠金额', `payment_amount` DECIMAL(16,2) COMMENT '参与某活动某规则累积支付金额' ) COMMENT '活动主题宽表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwt/dwt_activity_topic/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23数据装载
首日装载
insert overwrite table dwt_activity_topic partition(dt='2020-06-14') select t1.activity_rule_id, t1.activity_id, nvl(order_last_1d_count,0), nvl(order_last_1d_reduce_amount,0), nvl(order_last_1d_original_amount,0), nvl(order_last_1d_final_amount,0), nvl(order_count,0), nvl(order_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_last_1d_count,0), nvl(payment_last_1d_reduce_amount,0), nvl(payment_last_1d_amount,0), nvl(payment_count,0), nvl(payment_reduce_amount,0), nvl(payment_amount,0) from ( select activity_rule_id, activity_id from dim_activity_rule_info where dt='2020-06-14' )t1 left join ( select activity_rule_id, activity_id, sum(if(dt='2020-06-14',order_count,0)) order_last_1d_count, sum(if(dt='2020-06-14',order_reduce_amount,0)) order_last_1d_reduce_amount, sum(if(dt='2020-06-14',order_original_amount,0)) order_last_1d_original_amount, sum(if(dt='2020-06-14',order_final_amount,0)) order_last_1d_final_amount, sum(order_count) order_count, sum(order_reduce_amount) order_reduce_amount, sum(order_original_amount) order_original_amount, sum(order_final_amount) order_final_amount, sum(if(dt='2020-06-14',payment_count,0)) payment_last_1d_count, sum(if(dt='2020-06-14',payment_reduce_amount,0)) payment_last_1d_reduce_amount, sum(if(dt='2020-06-14',payment_amount,0)) payment_last_1d_amount, sum(payment_count) payment_count, sum(payment_reduce_amount) payment_reduce_amount, sum(payment_amount) payment_amount from dws_activity_info_daycount group by activity_rule_id,activity_id )t2 on t1.activity_rule_id=t2.activity_rule_id and t1.activity_id=t2.activity_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51每日装载
insert overwrite table dwt_activity_topic partition(dt='2020- insert overwrite table dwt_activity_topic partition(dt='2020-06-15') select nvl(1d_ago.activity_rule_id,old.activity_rule_id), nvl(1d_ago.activity_id,old.activity_id), nvl(1d_ago.order_count,0), nvl(1d_ago.order_reduce_amount,0.0), nvl(1d_ago.order_original_amount,0.0), nvl(1d_ago.order_final_amount,0.0), nvl(old.order_count,0)+nvl(1d_ago.order_count,0), nvl(old.order_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0), nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0), nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0), nvl(1d_ago.payment_count,0), nvl(1d_ago.payment_reduce_amount,0.0), nvl(1d_ago.payment_amount,0.0), nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0), nvl(old.payment_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0), nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0) from ( select activity_rule_id, activity_id, order_count, order_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_reduce_amount, payment_amount from dwt_activity_topic where dt=date_add('2020-06-15',-1) )old full outer join ( select activity_rule_id, activity_id, order_count, order_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_reduce_amount, payment_amount from dws_activity_info_daycount where dt='2020-06-15' )1d_ago on old.activity_rule_id=1d_ago.activity_rule_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# 地区主题
建表语句
DROP TABLE IF EXISTS dwt_area_topic; CREATE EXTERNAL TABLE dwt_area_topic( `province_id` STRING COMMENT '编号', `visit_last_1d_count` BIGINT COMMENT '最近1日访客访问次数', `login_last_1d_count` BIGINT COMMENT '最近1日用户访问次数', `visit_last_7d_count` BIGINT COMMENT '最近7访客访问次数', `login_last_7d_count` BIGINT COMMENT '最近7日用户访问次数', `visit_last_30d_count` BIGINT COMMENT '最近30日访客访问次数', `login_last_30d_count` BIGINT COMMENT '最近30日用户访问次数', `visit_count` BIGINT COMMENT '累积访客访问次数', `login_count` BIGINT COMMENT '累积用户访问次数', `order_last_1d_count` BIGINT COMMENT '最近1天下单次数', `order_last_1d_original_amount` DECIMAL(16,2) COMMENT '最近1天下单原始金额', `order_last_1d_final_amount` DECIMAL(16,2) COMMENT '最近1天下单最终金额', `order_last_7d_count` BIGINT COMMENT '最近7天下单次数', `order_last_7d_original_amount` DECIMAL(16,2) COMMENT '最近7天下单原始金额', `order_last_7d_final_amount` DECIMAL(16,2) COMMENT '最近7天下单最终金额', `order_last_30d_count` BIGINT COMMENT '最近30天下单次数', `order_last_30d_original_amount` DECIMAL(16,2) COMMENT '最近30天下单原始金额', `order_last_30d_final_amount` DECIMAL(16,2) COMMENT '最近30天下单最终金额', `order_count` BIGINT COMMENT '累积下单次数', `order_original_amount` DECIMAL(16,2) COMMENT '累积下单原始金额', `order_final_amount` DECIMAL(16,2) COMMENT '累积下单最终金额', `payment_last_1d_count` BIGINT COMMENT '最近1天支付次数', `payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1天支付金额', `payment_last_7d_count` BIGINT COMMENT '最近7天支付次数', `payment_last_7d_amount` DECIMAL(16,2) COMMENT '最近7天支付金额', `payment_last_30d_count` BIGINT COMMENT '最近30天支付次数', `payment_last_30d_amount` DECIMAL(16,2) COMMENT '最近30天支付金额', `payment_count` BIGINT COMMENT '累积支付次数', `payment_amount` DECIMAL(16,2) COMMENT '累积支付金额', `refund_order_last_1d_count` BIGINT COMMENT '最近1天退单次数', `refund_order_last_1d_amount` DECIMAL(16,2) COMMENT '最近1天退单金额', `refund_order_last_7d_count` BIGINT COMMENT '最近7天退单次数', `refund_order_last_7d_amount` DECIMAL(16,2) COMMENT '最近7天退单金额', `refund_order_last_30d_count` BIGINT COMMENT '最近30天退单次数', `refund_order_last_30d_amount` DECIMAL(16,2) COMMENT '最近30天退单金额', `refund_order_count` BIGINT COMMENT '累积退单次数', `refund_order_amount` DECIMAL(16,2) COMMENT '累积退单金额', `refund_payment_last_1d_count` BIGINT COMMENT '最近1天退款次数', `refund_payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1天退款金额', `refund_payment_last_7d_count` BIGINT COMMENT '最近7天退款次数', `refund_payment_last_7d_amount` DECIMAL(16,2) COMMENT '最近7天退款金额', `refund_payment_last_30d_count` BIGINT COMMENT '最近30天退款次数', `refund_payment_last_30d_amount` DECIMAL(16,2) COMMENT '最近30天退款金额', `refund_payment_count` BIGINT COMMENT '累积退款次数', `refund_payment_amount` DECIMAL(16,2) COMMENT '累积退款金额' ) COMMENT '地区主题宽表' PARTITIONED BY (`dt` STRING) STORED AS PARQUET LOCATION '/warehouse/gmall/dwt/dwt_area_topic/' TBLPROPERTIES ("parquet.compression"="lzo");
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52数据装载
首日装载
insert overwrite table dwt_area_topic partition(dt='2020-06-14') select id, nvl(visit_last_1d_count,0), nvl(login_last_1d_count,0), nvl(visit_last_7d_count,0), nvl(login_last_7d_count,0), nvl(visit_last_30d_count,0), nvl(login_last_30d_count,0), nvl(visit_count,0), nvl(login_count,0), nvl(order_last_1d_count,0), nvl(order_last_1d_original_amount,0), nvl(order_last_1d_final_amount,0), nvl(order_last_7d_count,0), nvl(order_last_7d_original_amount,0), nvl(order_last_7d_final_amount,0), nvl(order_last_30d_count,0), nvl(order_last_30d_original_amount,0), nvl(order_last_30d_final_amount,0), nvl(order_count,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_last_1d_count,0), nvl(payment_last_1d_amount,0), nvl(payment_last_7d_count,0), nvl(payment_last_7d_amount,0), nvl(payment_last_30d_count,0), nvl(payment_last_30d_amount,0), nvl(payment_count,0), nvl(payment_amount,0), nvl(refund_order_last_1d_count,0), nvl(refund_order_last_1d_amount,0), nvl(refund_order_last_7d_count,0), nvl(refund_order_last_7d_amount,0), nvl(refund_order_last_30d_count,0), nvl(refund_order_last_30d_amount,0), nvl(refund_order_count,0), nvl(refund_order_amount,0), nvl(refund_payment_last_1d_count,0), nvl(refund_payment_last_1d_amount,0), nvl(refund_payment_last_7d_count,0), nvl(refund_payment_last_7d_amount,0), nvl(refund_payment_last_30d_count,0), nvl(refund_payment_last_30d_amount,0), nvl(refund_payment_count,0), nvl(refund_payment_amount,0) from ( select id from dim_base_province )t1 left join ( select province_id province_id, sum(if(dt='2020-06-14',visit_count,0)) visit_last_1d_count, sum(if(dt='2020-06-14',login_count,0)) login_last_1d_count, sum(if(dt>=date_add('2020-06-14',-6),visit_count,0)) visit_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),login_count,0)) login_last_7d_count, sum(if(dt>=date_add('2020-06-14',-29),visit_count,0)) visit_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),login_count,0)) login_last_30d_count, sum(visit_count) visit_count, sum(login_count) login_count, sum(if(dt='2020-06-14',order_count,0)) order_last_1d_count, sum(if(dt='2020-06-14',order_original_amount,0)) order_last_1d_original_amount, sum(if(dt='2020-06-14',order_final_amount,0)) order_last_1d_final_amount, sum(if(dt>=date_add('2020-06-14',-6),order_count,0)) order_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),order_original_amount,0)) order_last_7d_original_amount, sum(if(dt>=date_add('2020-06-14',-6),order_final_amount,0)) order_last_7d_final_amount, sum(if(dt>=date_add('2020-06-14',-29),order_count,0)) order_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),order_original_amount,0)) order_last_30d_original_amount, sum(if(dt>=date_add('2020-06-14',-29),order_final_amount,0)) order_last_30d_final_amount, sum(order_count) order_count, sum(order_original_amount) order_original_amount, sum(order_final_amount) order_final_amount, sum(if(dt='2020-06-14',payment_count,0)) payment_last_1d_count, sum(if(dt='2020-06-14',payment_amount,0)) payment_last_1d_amount, sum(if(dt>=date_add('2020-06-14',-6),payment_count,0)) payment_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),payment_amount,0)) payment_last_7d_amount, sum(if(dt>=date_add('2020-06-14',-29),payment_count,0)) payment_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),payment_amount,0)) payment_last_30d_amount, sum(payment_count) payment_count, sum(payment_amount) payment_amount, sum(if(dt='2020-06-14',refund_order_count,0)) refund_order_last_1d_count, sum(if(dt='2020-06-14',refund_order_amount,0)) refund_order_last_1d_amount, sum(if(dt>=date_add('2020-06-14',-6),refund_order_count,0)) refund_order_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),refund_order_amount,0)) refund_order_last_7d_amount, sum(if(dt>=date_add('2020-06-14',-29),refund_order_count,0)) refund_order_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),refund_order_amount,0)) refund_order_last_30d_amount, sum(refund_order_count) refund_order_count, sum(refund_order_amount) refund_order_amount, sum(if(dt='2020-06-14',refund_payment_count,0)) refund_payment_last_1d_count, sum(if(dt='2020-06-14',refund_payment_amount,0)) refund_payment_last_1d_amount, sum(if(dt>=date_add('2020-06-14',-6),refund_payment_count,0)) refund_payment_last_7d_count, sum(if(dt>=date_add('2020-06-14',-6),refund_payment_amount,0)) refund_payment_last_7d_amount, sum(if(dt>=date_add('2020-06-14',-29),refund_payment_count,0)) refund_payment_last_30d_count, sum(if(dt>=date_add('2020-06-14',-29),refund_payment_amount,0)) refund_payment_last_30d_amount, sum(refund_payment_count) refund_payment_count, sum(refund_payment_amount) refund_payment_amount from dws_area_stats_daycount group by province_id )t2 on t1.id=t2.province_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105每日装载
insert overwrite table dwt_area_topic partition(dt='2020-06-15') select nvl(old.province_id, 1d_ago.province_id), nvl(1d_ago.visit_count,0), nvl(1d_ago.login_count,0), nvl(old.visit_last_7d_count,0)+nvl(1d_ago.visit_count,0)- nvl(7d_ago.visit_count,0), nvl(old.login_last_7d_count,0)+nvl(1d_ago.login_count,0)- nvl(7d_ago.login_count,0), nvl(old.visit_last_30d_count,0)+nvl(1d_ago.visit_count,0)- nvl(30d_ago.visit_count,0), nvl(old.login_last_30d_count,0)+nvl(1d_ago.login_count,0)- nvl(30d_ago.login_count,0), nvl(old.visit_count,0)+nvl(1d_ago.visit_count,0), nvl(old.login_count,0)+nvl(1d_ago.login_count,0), nvl(1d_ago.order_count,0), nvl(1d_ago.order_original_amount,0.0), nvl(1d_ago.order_final_amount,0.0), nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0), nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0), nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0), nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0), nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0), nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0), nvl(old.order_count,0)+nvl(1d_ago.order_count,0), nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0), nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0), nvl(1d_ago.payment_count,0), nvl(1d_ago.payment_amount,0.0), nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)- nvl(7d_ago.payment_count,0), nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(7d_ago.payment_amount,0.0), nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)- nvl(30d_ago.payment_count,0), nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0), nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0), nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0), nvl(1d_ago.refund_order_count,0), nvl(1d_ago.refund_order_amount,0.0), nvl(old.refund_order_last_7d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(7d_ago.refund_order_count,0), nvl(old.refund_order_last_7d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(7d_ago.refund_order_amount,0.0), nvl(old.refund_order_last_30d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(30d_ago.refund_order_count,0), nvl(old.refund_order_last_30d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(30d_ago.refund_order_amount,0.0), nvl(old.refund_order_count,0)+nvl(1d_ago.refund_order_count,0), nvl(old.refund_order_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0), nvl(1d_ago.refund_payment_count,0), nvl(1d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_last_7d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(7d_ago.refund_payment_count,0), nvl(old.refund_payment_last_7d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(7d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_last_30d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(30d_ago.refund_payment_count,0), nvl(old.refund_payment_last_30d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(30d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_count,0)+nvl(1d_ago.refund_payment_count,0), nvl(old.refund_payment_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0) from ( select province_id, visit_last_1d_count, login_last_1d_count, visit_last_7d_count, login_last_7d_count, visit_last_30d_count, login_last_30d_count, visit_count, login_count, order_last_1d_count, order_last_1d_original_amount, order_last_1d_final_amount, order_last_7d_count, order_last_7d_original_amount, order_last_7d_final_amount, order_last_30d_count, order_last_30d_original_amount, order_last_30d_final_amount, order_count, order_original_amount, order_final_amount, payment_last_1d_count, payment_last_1d_amount, payment_last_7d_count, payment_last_7d_amount, payment_last_30d_count, payment_last_30d_amount, payment_count, payment_amount, refund_order_last_1d_count, refund_order_last_1d_amount, refund_order_last_7d_count, refund_order_last_7d_amount, refund_order_last_30d_count, refund_order_last_30d_amount, refund_order_count, refund_order_amount, refund_payment_last_1d_count, refund_payment_last_1d_amount, refund_payment_last_7d_count, refund_payment_last_7d_amount, refund_payment_last_30d_count, refund_payment_last_30d_amount, refund_payment_count, refund_payment_amount from dwt_area_topic where dt=date_add('2020-06-15',-1) )old full outer join ( select province_id, visit_count, login_count, order_count, order_original_amount, order_final_amount, payment_count, payment_amount, refund_order_count, refund_order_amount, refund_payment_count, refund_payment_amount from dws_area_stats_daycount where dt='2020-06-15' )1d_ago on old.province_id=1d_ago.province_id left join ( select province_id, visit_count, login_count, order_count, order_original_amount, order_final_amount, payment_count, payment_amount, refund_order_count, refund_order_amount, refund_payment_count, refund_payment_amount from dws_area_stats_daycount where dt=date_add('2020-06-15',-7) )7d_ago on old.province_id= 7d_ago.province_id left join ( select province_id, visit_count, login_count, order_count, order_original_amount, order_final_amount, payment_count, payment_amount, refund_order_count, refund_order_amount, refund_payment_count, refund_payment_amount from dws_area_stats_daycount where dt=date_add('2020-06-15',-30) )30d_ago on old.province_id= 30d_ago.province_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
# DWT层首日数据导入脚本
在/home/damoncai/bin目录下创建脚本dws_to_dwt_init.sh
#!/bin/bash APP=gmall if [ -n "$2" ] ;then do_date=$2 else echo "请传入日期参数" exit fi dwt_visitor_topic=" insert overwrite table ${APP}.dwt_visitor_topic partition(dt='$do_date') select nvl(1d_ago.mid_id,old.mid_id), nvl(1d_ago.brand,old.brand), nvl(1d_ago.model,old.model), nvl(1d_ago.channel,old.channel), nvl(1d_ago.os,old.os), nvl(1d_ago.area_code,old.area_code), nvl(1d_ago.version_code,old.version_code), case when old.mid_id is null and 1d_ago.is_new=1 then '$do_date' when old.mid_id is null and 1d_ago.is_new=0 then '2020-06-13'--无法获取准确的首次登录日期,给定一个数仓搭建日之前的日期 else old.visit_date_first end, if(1d_ago.mid_id is not null,'$do_date',old.visit_date_last), nvl(1d_ago.visit_count,0), if(1d_ago.mid_id is null,0,1), nvl(old.visit_last_7d_count,0)+nvl(1d_ago.visit_count,0)- nvl(7d_ago.visit_count,0), nvl(old.visit_last_7d_day_count,0)+if(1d_ago.mid_id is null,0,1)- if(7d_ago.mid_id is null,0,1), nvl(old.visit_last_30d_count,0)+nvl(1d_ago.visit_count,0)- nvl(30d_ago.visit_count,0), nvl(old.visit_last_30d_day_count,0)+if(1d_ago.mid_id is null,0,1)- if(30d_ago.mid_id is null,0,1), nvl(old.visit_count,0)+nvl(1d_ago.visit_count,0), nvl(old.visit_day_count,0)+if(1d_ago.mid_id is null,0,1) from ( select mid_id, brand, model, channel, os, area_code, version_code, visit_date_first, visit_date_last, visit_last_1d_count, visit_last_1d_day_count, visit_last_7d_count, visit_last_7d_day_count, visit_last_30d_count, visit_last_30d_day_count, visit_count, visit_day_count from ${APP}.dwt_visitor_topic where dt=date_add('$do_date',-1) )old full outer join ( select mid_id, brand, model, is_new, channel, os, area_code, version_code, visit_count from ${APP}.dws_visitor_action_daycount where dt='$do_date' )1d_ago on old.mid_id=1d_ago.mid_id left join ( select mid_id, brand, model, is_new, channel, os, area_code, version_code, visit_count from ${APP}.dws_visitor_action_daycount where dt=date_add('$do_date',-7) )7d_ago on old.mid_id=7d_ago.mid_id left join ( select mid_id, brand, model, is_new, channel, os, area_code, version_code, visit_count from ${APP}.dws_visitor_action_daycount where dt=date_add('$do_date',-30) )30d_ago on old.mid_id=30d_ago.mid_id; " dwt_user_topic=" insert overwrite table ${APP}.dwt_user_topic partition(dt='$do_date') select id, login_date_first,--以用户的创建日期作为首次登录日期 nvl(login_date_last,date_add('$do_date',-1)),--若有历史登录记录,则根据历史记录获取末次登录日期,否则统一指定一个日期 nvl(login_last_1d_count,0), nvl(login_last_1d_day_count,0), nvl(login_last_7d_count,0), nvl(login_last_7d_day_count,0), nvl(login_last_30d_count,0), nvl(login_last_30d_day_count,0), nvl(login_count,0), nvl(login_day_count,0), order_date_first, order_date_last, nvl(order_last_1d_count,0), nvl(order_activity_last_1d_count,0), nvl(order_activity_reduce_last_1d_amount,0), nvl(order_coupon_last_1d_count,0), nvl(order_coupon_reduce_last_1d_amount,0), nvl(order_last_1d_original_amount,0), nvl(order_last_1d_final_amount,0), nvl(order_last_7d_count,0), nvl(order_activity_last_7d_count,0), nvl(order_activity_reduce_last_7d_amount,0), nvl(order_coupon_last_7d_count,0), nvl(order_coupon_reduce_last_7d_amount,0), nvl(order_last_7d_original_amount,0), nvl(order_last_7d_final_amount,0), nvl(order_last_30d_count,0), nvl(order_activity_last_30d_count,0), nvl(order_activity_reduce_last_30d_amount,0), nvl(order_coupon_last_30d_count,0), nvl(order_coupon_reduce_last_30d_amount,0), nvl(order_last_30d_original_amount,0), nvl(order_last_30d_final_amount,0), nvl(order_count,0), nvl(order_activity_count,0), nvl(order_activity_reduce_amount,0), nvl(order_coupon_count,0), nvl(order_coupon_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), payment_date_first, payment_date_last, nvl(payment_last_1d_count,0), nvl(payment_last_1d_amount,0), nvl(payment_last_7d_count,0), nvl(payment_last_7d_amount,0), nvl(payment_last_30d_count,0), nvl(payment_last_30d_amount,0), nvl(payment_count,0), nvl(payment_amount,0), nvl(refund_order_last_1d_count,0), nvl(refund_order_last_1d_num,0), nvl(refund_order_last_1d_amount,0), nvl(refund_order_last_7d_count,0), nvl(refund_order_last_7d_num,0), nvl(refund_order_last_7d_amount,0), nvl(refund_order_last_30d_count,0), nvl(refund_order_last_30d_num,0), nvl(refund_order_last_30d_amount,0), nvl(refund_order_count,0), nvl(refund_order_num,0), nvl(refund_order_amount,0), nvl(refund_payment_last_1d_count,0), nvl(refund_payment_last_1d_num,0), nvl(refund_payment_last_1d_amount,0), nvl(refund_payment_last_7d_count,0), nvl(refund_payment_last_7d_num,0), nvl(refund_payment_last_7d_amount,0), nvl(refund_payment_last_30d_count,0), nvl(refund_payment_last_30d_num,0), nvl(refund_payment_last_30d_amount,0), nvl(refund_payment_count,0), nvl(refund_payment_num,0), nvl(refund_payment_amount,0), nvl(cart_last_1d_count,0), nvl(cart_last_7d_count,0), nvl(cart_last_30d_count,0), nvl(cart_count,0), nvl(favor_last_1d_count,0), nvl(favor_last_7d_count,0), nvl(favor_last_30d_count,0), nvl(favor_count,0), nvl(coupon_last_1d_get_count,0), nvl(coupon_last_1d_using_count,0), nvl(coupon_last_1d_used_count,0), nvl(coupon_last_7d_get_count,0), nvl(coupon_last_7d_using_count,0), nvl(coupon_last_7d_used_count,0), nvl(coupon_last_30d_get_count,0), nvl(coupon_last_30d_using_count,0), nvl(coupon_last_30d_used_count,0), nvl(coupon_get_count,0), nvl(coupon_using_count,0), nvl(coupon_used_count,0), nvl(appraise_last_1d_good_count,0), nvl(appraise_last_1d_mid_count,0), nvl(appraise_last_1d_bad_count,0), nvl(appraise_last_1d_default_count,0), nvl(appraise_last_7d_good_count,0), nvl(appraise_last_7d_mid_count,0), nvl(appraise_last_7d_bad_count,0), nvl(appraise_last_7d_default_count,0), nvl(appraise_last_30d_good_count,0), nvl(appraise_last_30d_mid_count,0), nvl(appraise_last_30d_bad_count,0), nvl(appraise_last_30d_default_count,0), nvl(appraise_good_count,0), nvl(appraise_mid_count,0), nvl(appraise_bad_count,0), nvl(appraise_default_count,0) from ( select id, date_format(create_time,'yyyy-MM-dd') login_date_first from ${APP}.dim_user_info where dt='9999-99-99' )t1 left join ( select user_id user_id, max(dt) login_date_last, sum(if(dt='$do_date',login_count,0)) login_last_1d_count, sum(if(dt='$do_date' and login_count>0,1,0)) login_last_1d_day_count, sum(if(dt>=date_add('$do_date',-6),login_count,0)) login_last_7d_count, sum(if(dt>=date_add('$do_date',-6) and login_count>0,1,0)) login_last_7d_day_count, sum(if(dt>=date_add('$do_date',-29),login_count,0)) login_last_30d_count, sum(if(dt>=date_add('$do_date',-29) and login_count>0,1,0)) login_last_30d_day_count, sum(login_count) login_count, sum(if(login_count>0,1,0)) login_day_count, min(if(order_count>0,dt,null)) order_date_first, max(if(order_count>0,dt,null)) order_date_last, sum(if(dt='$do_date',order_count,0)) order_last_1d_count, sum(if(dt='$do_date',order_activity_count,0)) order_activity_last_1d_count, sum(if(dt='$do_date',order_activity_reduce_amount,0)) order_activity_reduce_last_1d_amount, sum(if(dt='$do_date',order_coupon_count,0)) order_coupon_last_1d_count, sum(if(dt='$do_date',order_coupon_reduce_amount,0)) order_coupon_reduce_last_1d_amount, sum(if(dt='$do_date',order_original_amount,0)) order_last_1d_original_amount, sum(if(dt='$do_date',order_final_amount,0)) order_last_1d_final_amount, sum(if(dt>=date_add('$do_date',-6),order_count,0)) order_last_7d_count, sum(if(dt>=date_add('$do_date',-6),order_activity_count,0)) order_activity_last_7d_count, sum(if(dt>=date_add('$do_date',-6),order_activity_reduce_amount,0)) order_activity_reduce_last_7d_amount, sum(if(dt>=date_add('$do_date',-6),order_coupon_count,0)) order_coupon_last_7d_count, sum(if(dt>=date_add('$do_date',-6),order_coupon_reduce_amount,0)) order_coupon_reduce_last_7d_amount, sum(if(dt>=date_add('$do_date',-6),order_original_amount,0)) order_last_7d_original_amount, sum(if(dt>=date_add('$do_date',-6),order_final_amount,0)) order_last_7d_final_amount, sum(if(dt>=date_add('$do_date',-29),order_count,0)) order_last_30d_count, sum(if(dt>=date_add('$do_date',-29),order_activity_count,0)) order_activity_last_30d_count, sum(if(dt>=date_add('$do_date',-29),order_activity_reduce_amount,0)) order_activity_reduce_last_30d_amount, sum(if(dt>=date_add('$do_date',-29),order_coupon_count,0)) order_coupon_last_30d_count, sum(if(dt>=date_add('$do_date',-29),order_coupon_reduce_amount,0)) order_coupon_reduce_last_30d_amount, sum(if(dt>=date_add('$do_date',-29),order_original_amount,0)) order_last_30d_original_amount, sum(if(dt>=date_add('$do_date',-29),order_final_amount,0)) order_last_30d_final_amount, sum(order_count) order_count, sum(order_activity_count) order_activity_count, sum(order_activity_reduce_amount) order_activity_reduce_amount, sum(order_coupon_count) order_coupon_count, sum(order_coupon_reduce_amount) order_coupon_reduce_amount, sum(order_original_amount) order_original_amount, sum(order_final_amount) order_final_amount, min(if(payment_count>0,dt,null)) payment_date_first, max(if(payment_count>0,dt,null)) payment_date_last, sum(if(dt='$do_date',payment_count,0)) payment_last_1d_count, sum(if(dt='$do_date',payment_amount,0)) payment_last_1d_amount, sum(if(dt>=date_add('$do_date',-6),payment_count,0)) payment_last_7d_count, sum(if(dt>=date_add('$do_date',-6),payment_amount,0)) payment_last_7d_amount, sum(if(dt>=date_add('$do_date',-29),payment_count,0)) payment_last_30d_count, sum(if(dt>=date_add('$do_date',-29),payment_amount,0)) payment_last_30d_amount, sum(payment_count) payment_count, sum(payment_amount) payment_amount, sum(if(dt='$do_date',refund_order_count,0)) refund_order_last_1d_count, sum(if(dt='$do_date',refund_order_num,0)) refund_order_last_1d_num, sum(if(dt='$do_date',refund_order_amount,0)) refund_order_last_1d_amount, sum(if(dt>=date_add('$do_date',-6),refund_order_count,0)) refund_order_last_7d_count, sum(if(dt>=date_add('$do_date',-6),refund_order_num,0)) refund_order_last_7d_num, sum(if(dt>=date_add('$do_date',-6),refund_order_amount,0)) refund_order_last_7d_amount, sum(if(dt>=date_add('$do_date',-29),refund_order_count,0)) refund_order_last_30d_count, sum(if(dt>=date_add('$do_date',-29),refund_order_num,0)) refund_order_last_30d_num, sum(if(dt>=date_add('$do_date',-29),refund_order_amount,0)) refund_order_last_30d_amount, sum(refund_order_count) refund_order_count, sum(refund_order_num) refund_order_num, sum(refund_order_amount) refund_order_amount, sum(if(dt='$do_date',refund_payment_count,0)) refund_payment_last_1d_count, sum(if(dt='$do_date',refund_payment_num,0)) refund_payment_last_1d_num, sum(if(dt='$do_date',refund_payment_amount,0)) refund_payment_last_1d_amount, sum(if(dt>=date_add('$do_date',-6),refund_payment_count,0)) refund_payment_last_7d_count, sum(if(dt>=date_add('$do_date',-6),refund_payment_num,0)) refund_payment_last_7d_num, sum(if(dt>=date_add('$do_date',-6),refund_payment_amount,0)) refund_payment_last_7d_amount, sum(if(dt>=date_add('$do_date',-29),refund_payment_count,0)) refund_payment_last_30d_count, sum(if(dt>=date_add('$do_date',-29),refund_payment_num,0)) refund_payment_last_30d_num, sum(if(dt>=date_add('$do_date',-29),refund_payment_amount,0)) refund_payment_last_30d_amount, sum(refund_payment_count) refund_payment_count, sum(refund_payment_num) refund_payment_num, sum(refund_payment_amount) refund_payment_amount, sum(if(dt='$do_date',cart_count,0)) cart_last_1d_count, sum(if(dt>=date_add('$do_date',-6),cart_count,0)) cart_last_7d_count, sum(if(dt>=date_add('$do_date',-29),cart_count,0)) cart_last_30d_count, sum(cart_count) cart_count, sum(if(dt='$do_date',favor_count,0)) favor_last_1d_count, sum(if(dt>=date_add('$do_date',-6),favor_count,0)) favor_last_7d_count, sum(if(dt>=date_add('$do_date',-29),favor_count,0)) favor_last_30d_count, sum(favor_count) favor_count, sum(if(dt='$do_date',coupon_get_count,0)) coupon_last_1d_get_count, sum(if(dt='$do_date',coupon_using_count,0)) coupon_last_1d_using_count, sum(if(dt='$do_date',coupon_used_count,0)) coupon_last_1d_used_count, sum(if(dt>=date_add('$do_date',-6),coupon_get_count,0)) coupon_last_7d_get_count, sum(if(dt>=date_add('$do_date',-6),coupon_using_count,0)) coupon_last_7d_using_count, sum(if(dt>=date_add('$do_date',-6),coupon_used_count,0)) coupon_last_7d_used_count, sum(if(dt>=date_add('$do_date',-29),coupon_get_count,0)) coupon_last_30d_get_count, sum(if(dt>=date_add('$do_date',-29),coupon_using_count,0)) coupon_last_30d_using_count, sum(if(dt>=date_add('$do_date',-29),coupon_used_count,0)) coupon_last_30d_used_count, sum(coupon_get_count) coupon_get_count, sum(coupon_using_count) coupon_using_count, sum(coupon_used_count) coupon_used_count, sum(if(dt='$do_date',appraise_good_count,0)) appraise_last_1d_good_count, sum(if(dt='$do_date',appraise_mid_count,0)) appraise_last_1d_mid_count, sum(if(dt='$do_date',appraise_bad_count,0)) appraise_last_1d_bad_count, sum(if(dt='$do_date',appraise_default_count,0)) appraise_last_1d_default_count, sum(if(dt>=date_add('$do_date',-6),appraise_good_count,0)) appraise_last_7d_good_count, sum(if(dt>=date_add('$do_date',-6),appraise_mid_count,0)) appraise_last_7d_mid_count, sum(if(dt>=date_add('$do_date',-6),appraise_bad_count,0)) appraise_last_7d_bad_count, sum(if(dt>=date_add('$do_date',-6),appraise_default_count,0)) appraise_last_7d_default_count, sum(if(dt>=date_add('$do_date',-29),appraise_good_count,0)) appraise_last_30d_good_count, sum(if(dt>=date_add('$do_date',-29),appraise_mid_count,0)) appraise_last_30d_mid_count, sum(if(dt>=date_add('$do_date',-29),appraise_bad_count,0)) appraise_last_30d_bad_count, sum(if(dt>=date_add('$do_date',-29),appraise_default_count,0)) appraise_last_30d_default_count, sum(appraise_good_count) appraise_good_count, sum(appraise_mid_count) appraise_mid_count, sum(appraise_bad_count) appraise_bad_count, sum(appraise_default_count) appraise_default_count from ${APP}.dws_user_action_daycount group by user_id )t2 on t1.id=t2.user_id; " dwt_sku_topic=" insert overwrite table ${APP}.dwt_sku_topic partition(dt='$do_date') select id, nvl(order_last_1d_count,0), nvl(order_last_1d_num,0), nvl(order_activity_last_1d_count,0), nvl(order_coupon_last_1d_count,0), nvl(order_activity_reduce_last_1d_amount,0), nvl(order_coupon_reduce_last_1d_amount,0), nvl(order_last_1d_original_amount,0), nvl(order_last_1d_final_amount,0), nvl(order_last_7d_count,0), nvl(order_last_7d_num,0), nvl(order_activity_last_7d_count,0), nvl(order_coupon_last_7d_count,0), nvl(order_activity_reduce_last_7d_amount,0), nvl(order_coupon_reduce_last_7d_amount,0), nvl(order_last_7d_original_amount,0), nvl(order_last_7d_final_amount,0), nvl(order_last_30d_count,0), nvl(order_last_30d_num,0), nvl(order_activity_last_30d_count,0), nvl(order_coupon_last_30d_count,0), nvl(order_activity_reduce_last_30d_amount,0), nvl(order_coupon_reduce_last_30d_amount,0), nvl(order_last_30d_original_amount,0), nvl(order_last_30d_final_amount,0), nvl(order_count,0), nvl(order_num,0), nvl(order_activity_count,0), nvl(order_coupon_count,0), nvl(order_activity_reduce_amount,0), nvl(order_coupon_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_last_1d_count,0), nvl(payment_last_1d_num,0), nvl(payment_last_1d_amount,0), nvl(payment_last_7d_count,0), nvl(payment_last_7d_num,0), nvl(payment_last_7d_amount,0), nvl(payment_last_30d_count,0), nvl(payment_last_30d_num,0), nvl(payment_last_30d_amount,0), nvl(payment_count,0), nvl(payment_num,0), nvl(payment_amount,0), nvl(refund_order_last_1d_count,0), nvl(refund_order_last_1d_num,0), nvl(refund_order_last_1d_amount,0), nvl(refund_order_last_7d_count,0), nvl(refund_order_last_7d_num,0), nvl(refund_order_last_7d_amount,0), nvl(refund_order_last_30d_count,0), nvl(refund_order_last_30d_num,0), nvl(refund_order_last_30d_amount,0), nvl(refund_order_count,0), nvl(refund_order_num,0), nvl(refund_order_amount,0), nvl(refund_payment_last_1d_count,0), nvl(refund_payment_last_1d_num,0), nvl(refund_payment_last_1d_amount,0), nvl(refund_payment_last_7d_count,0), nvl(refund_payment_last_7d_num,0), nvl(refund_payment_last_7d_amount,0), nvl(refund_payment_last_30d_count,0), nvl(refund_payment_last_30d_num,0), nvl(refund_payment_last_30d_amount,0), nvl(refund_payment_count,0), nvl(refund_payment_num,0), nvl(refund_payment_amount,0), nvl(cart_last_1d_count,0), nvl(cart_last_7d_count,0), nvl(cart_last_30d_count,0), nvl(cart_count,0), nvl(favor_last_1d_count,0), nvl(favor_last_7d_count,0), nvl(favor_last_30d_count,0), nvl(favor_count,0), nvl(appraise_last_1d_good_count,0), nvl(appraise_last_1d_mid_count,0), nvl(appraise_last_1d_bad_count,0), nvl(appraise_last_1d_default_count,0), nvl(appraise_last_7d_good_count,0), nvl(appraise_last_7d_mid_count,0), nvl(appraise_last_7d_bad_count,0), nvl(appraise_last_7d_default_count,0), nvl(appraise_last_30d_good_count,0), nvl(appraise_last_30d_mid_count,0), nvl(appraise_last_30d_bad_count,0), nvl(appraise_last_30d_default_count,0), nvl(appraise_good_count,0), nvl(appraise_mid_count,0), nvl(appraise_bad_count,0), nvl(appraise_default_count,0) from ( select id from ${APP}.dim_sku_info where dt='$do_date' )t1 left join ( select sku_id, sum(if(dt='$do_date',order_count,0)) order_last_1d_count, sum(if(dt='$do_date',order_num,0)) order_last_1d_num, sum(if(dt='$do_date',order_activity_count,0)) order_activity_last_1d_count, sum(if(dt='$do_date',order_coupon_count,0)) order_coupon_last_1d_count, sum(if(dt='$do_date',order_activity_reduce_amount,0)) order_activity_reduce_last_1d_amount, sum(if(dt='$do_date',order_coupon_reduce_amount,0)) order_coupon_reduce_last_1d_amount, sum(if(dt='$do_date',order_original_amount,0)) order_last_1d_original_amount, sum(if(dt='$do_date',order_final_amount,0)) order_last_1d_final_amount, sum(if(dt>=date_add('$do_date',-6),order_count,0)) order_last_7d_count, sum(if(dt>=date_add('$do_date',-6),order_num,0)) order_last_7d_num, sum(if(dt>=date_add('$do_date',-6),order_activity_count,0)) order_activity_last_7d_count, sum(if(dt>=date_add('$do_date',-6),order_coupon_count,0)) order_coupon_last_7d_count, sum(if(dt>=date_add('$do_date',-6),order_activity_reduce_amount,0)) order_activity_reduce_last_7d_amount, sum(if(dt>=date_add('$do_date',-6),order_coupon_reduce_amount,0)) order_coupon_reduce_last_7d_amount, sum(if(dt>=date_add('$do_date',-6),order_original_amount,0)) order_last_7d_original_amount, sum(if(dt>=date_add('$do_date',-6),order_final_amount,0)) order_last_7d_final_amount, sum(if(dt>=date_add('$do_date',-29),order_count,0)) order_last_30d_count, sum(if(dt>=date_add('$do_date',-29),order_num,0)) order_last_30d_num, sum(if(dt>=date_add('$do_date',-29),order_activity_count,0)) order_activity_last_30d_count, sum(if(dt>=date_add('$do_date',-29),order_coupon_count,0)) order_coupon_last_30d_count, sum(if(dt>=date_add('$do_date',-29),order_activity_reduce_amount,0)) order_activity_reduce_last_30d_amount, sum(if(dt>=date_add('$do_date',-29),order_coupon_reduce_amount,0)) order_coupon_reduce_last_30d_amount, sum(if(dt>=date_add('$do_date',-29),order_original_amount,0)) order_last_30d_original_amount, sum(if(dt>=date_add('$do_date',-29),order_final_amount,0)) order_last_30d_final_amount, sum(order_count) order_count, sum(order_num) order_num, sum(order_activity_count) order_activity_count, sum(order_coupon_count) order_coupon_count, sum(order_activity_reduce_amount) order_activity_reduce_amount, sum(order_coupon_reduce_amount) order_coupon_reduce_amount, sum(order_original_amount) order_original_amount, sum(order_final_amount) order_final_amount, sum(if(dt='$do_date',payment_count,0)) payment_last_1d_count, sum(if(dt='$do_date',payment_num,0)) payment_last_1d_num, sum(if(dt='$do_date',payment_amount,0)) payment_last_1d_amount, sum(if(dt>=date_add('$do_date',-6),payment_count,0)) payment_last_7d_count, sum(if(dt>=date_add('$do_date',-6),payment_num,0)) payment_last_7d_num, sum(if(dt>=date_add('$do_date',-6),payment_amount,0)) payment_last_7d_amount, sum(if(dt>=date_add('$do_date',-29),payment_count,0)) payment_last_30d_count, sum(if(dt>=date_add('$do_date',-29),payment_num,0)) payment_last_30d_num, sum(if(dt>=date_add('$do_date',-29),payment_amount,0)) payment_last_30d_amount, sum(payment_count) payment_count, sum(payment_num) payment_num, sum(payment_amount) payment_amount, sum(if(dt='$do_date',refund_order_count,0)) refund_order_last_1d_count, sum(if(dt='$do_date',refund_order_num,0)) refund_order_last_1d_num, sum(if(dt='$do_date',refund_order_amount,0)) refund_order_last_1d_amount, sum(if(dt>=date_add('$do_date',-6),refund_order_count,0)) refund_order_last_7d_count, sum(if(dt>=date_add('$do_date',-6),refund_order_num,0)) refund_order_last_7d_num, sum(if(dt>=date_add('$do_date',-6),refund_order_amount,0)) refund_order_last_7d_amount, sum(if(dt>=date_add('$do_date',-29),refund_order_count,0)) refund_order_last_30d_count, sum(if(dt>=date_add('$do_date',-29),refund_order_num,0)) refund_order_last_30d_num, sum(if(dt>=date_add('$do_date',-29),refund_order_amount,0)) refund_order_last_30d_amount, sum(refund_order_count) refund_order_count, sum(refund_order_num) refund_order_num, sum(refund_order_amount) refund_order_amount, sum(if(dt='$do_date',refund_payment_count,0)) refund_payment_last_1d_count, sum(if(dt='$do_date',refund_payment_num,0)) refund_payment_last_1d_num, sum(if(dt='$do_date',refund_payment_amount,0)) refund_payment_last_1d_amount, sum(if(dt>=date_add('$do_date',-6),refund_payment_count,0)) refund_payment_last_7d_count, sum(if(dt>=date_add('$do_date',-6),refund_payment_num,0)) refund_payment_last_7d_num, sum(if(dt>=date_add('$do_date',-6),refund_payment_amount,0)) refund_payment_last_7d_amount, sum(if(dt>=date_add('$do_date',-29),refund_payment_count,0)) refund_payment_last_30d_count, sum(if(dt>=date_add('$do_date',-29),refund_payment_num,0)) refund_payment_last_30d_num, sum(if(dt>=date_add('$do_date',-29),refund_payment_amount,0)) refund_payment_last_30d_amount, sum(refund_payment_count) refund_payment_count, sum(refund_payment_num) refund_payment_num, sum(refund_payment_amount) refund_payment_amount, sum(if(dt='$do_date',cart_count,0)) cart_last_1d_count, sum(if(dt>=date_add('$do_date',-6),cart_count,0)) cart_last_7d_count, sum(if(dt>=date_add('$do_date',-29),cart_count,0)) cart_last_30d_count, sum(cart_count) cart_count, sum(if(dt='$do_date',favor_count,0)) favor_last_1d_count, sum(if(dt>=date_add('$do_date',-6),favor_count,0)) favor_last_7d_count, sum(if(dt>=date_add('$do_date',-29),favor_count,0)) favor_last_30d_count, sum(favor_count) favor_count, sum(if(dt='$do_date',appraise_good_count,0)) appraise_last_1d_good_count, sum(if(dt='$do_date',appraise_mid_count,0)) appraise_last_1d_mid_count, sum(if(dt='$do_date',appraise_bad_count,0)) appraise_last_1d_bad_count, sum(if(dt='$do_date',appraise_default_count,0)) appraise_last_1d_default_count, sum(if(dt>=date_add('$do_date',-6),appraise_good_count,0)) appraise_last_7d_good_count, sum(if(dt>=date_add('$do_date',-6),appraise_mid_count,0)) appraise_last_7d_mid_count, sum(if(dt>=date_add('$do_date',-6),appraise_bad_count,0)) appraise_last_7d_bad_count, sum(if(dt>=date_add('$do_date',-6),appraise_default_count,0)) appraise_last_7d_default_count, sum(if(dt>=date_add('$do_date',-29),appraise_good_count,0)) appraise_last_30d_good_count, sum(if(dt>=date_add('$do_date',-29),appraise_mid_count,0)) appraise_last_30d_mid_count, sum(if(dt>=date_add('$do_date',-29),appraise_bad_count,0)) appraise_last_30d_bad_count, sum(if(dt>=date_add('$do_date',-29),appraise_default_count,0)) appraise_last_30d_default_count, sum(appraise_good_count) appraise_good_count, sum(appraise_mid_count) appraise_mid_count, sum(appraise_bad_count) appraise_bad_count, sum(appraise_default_count) appraise_default_count from ${APP}.dws_sku_action_daycount group by sku_id )t2 on t1.id=t2.sku_id; " dwt_coupon_topic=" insert overwrite table ${APP}.dwt_coupon_topic partition(dt='$do_date') select id, nvl(get_last_1d_count,0), nvl(get_last_7d_count,0), nvl(get_last_30d_count,0), nvl(get_count,0), nvl(order_last_1d_count,0), nvl(order_last_1d_reduce_amount,0), nvl(order_last_1d_original_amount,0), nvl(order_last_1d_final_amount,0), nvl(order_last_7d_count,0), nvl(order_last_7d_reduce_amount,0), nvl(order_last_7d_original_amount,0), nvl(order_last_7d_final_amount,0), nvl(order_last_30d_count,0), nvl(order_last_30d_reduce_amount,0), nvl(order_last_30d_original_amount,0), nvl(order_last_30d_final_amount,0), nvl(order_count,0), nvl(order_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_last_1d_count,0), nvl(payment_last_1d_reduce_amount,0), nvl(payment_last_1d_amount,0), nvl(payment_last_7d_count,0), nvl(payment_last_7d_reduce_amount,0), nvl(payment_last_7d_amount,0), nvl(payment_last_30d_count,0), nvl(payment_last_30d_reduce_amount,0), nvl(payment_last_30d_amount,0), nvl(payment_count,0), nvl(payment_reduce_amount,0), nvl(payment_amount,0), nvl(expire_last_1d_count,0), nvl(expire_last_7d_count,0), nvl(expire_last_30d_count,0), nvl(expire_count,0) from ( select id from ${APP}.dim_coupon_info where dt='$do_date' )t1 left join ( select coupon_id coupon_id, sum(if(dt='$do_date',get_count,0)) get_last_1d_count, sum(if(dt>=date_add('$do_date',-6),get_count,0)) get_last_7d_count, sum(if(dt>=date_add('$do_date',-29),get_count,0)) get_last_30d_count, sum(get_count) get_count, sum(if(dt='$do_date',order_count,0)) order_last_1d_count, sum(if(dt='$do_date',order_reduce_amount,0)) order_last_1d_reduce_amount, sum(if(dt='$do_date',order_original_amount,0)) order_last_1d_original_amount, sum(if(dt='$do_date',order_final_amount,0)) order_last_1d_final_amount, sum(if(dt>=date_add('$do_date',-6),order_count,0)) order_last_7d_count, sum(if(dt>=date_add('$do_date',-6),order_reduce_amount,0)) order_last_7d_reduce_amount, sum(if(dt>=date_add('$do_date',-6),order_original_amount,0)) order_last_7d_original_amount, sum(if(dt>=date_add('$do_date',-6),order_final_amount,0)) order_last_7d_final_amount, sum(if(dt>=date_add('$do_date',-29),order_count,0)) order_last_30d_count, sum(if(dt>=date_add('$do_date',-29),order_reduce_amount,0)) order_last_30d_reduce_amount, sum(if(dt>=date_add('$do_date',-29),order_original_amount,0)) order_last_30d_original_amount, sum(if(dt>=date_add('$do_date',-29),order_final_amount,0)) order_last_30d_final_amount, sum(order_count) order_count, sum(order_reduce_amount) order_reduce_amount, sum(order_original_amount) order_original_amount, sum(order_final_amount) order_final_amount, sum(if(dt='$do_date',payment_count,0)) payment_last_1d_count, sum(if(dt='$do_date',payment_reduce_amount,0)) payment_last_1d_reduce_amount, sum(if(dt='$do_date',payment_amount,0)) payment_last_1d_amount, sum(if(dt>=date_add('$do_date',-6),payment_count,0)) payment_last_7d_count, sum(if(dt>=date_add('$do_date',-6),payment_reduce_amount,0)) payment_last_7d_reduce_amount, sum(if(dt>=date_add('$do_date',-6),payment_amount,0)) payment_last_7d_amount, sum(if(dt>=date_add('$do_date',-29),payment_count,0)) payment_last_30d_count, sum(if(dt>=date_add('$do_date',-29),payment_reduce_amount,0)) payment_last_30d_reduce_amount, sum(if(dt>=date_add('$do_date',-29),payment_amount,0)) payment_last_30d_amount, sum(payment_count) payment_count, sum(payment_reduce_amount) payment_reduce_amount, sum(payment_amount) payment_amount, sum(if(dt='$do_date',expire_count,0)) expire_last_1d_count, sum(if(dt>=date_add('$do_date',-6),expire_count,0)) expire_last_7d_count, sum(if(dt>=date_add('$do_date',-29),expire_count,0)) expire_last_30d_count, sum(expire_count) expire_count from ${APP}.dws_coupon_info_daycount group by coupon_id )t2 on t1.id=t2.coupon_id; " dwt_activity_topic=" insert overwrite table ${APP}.dwt_activity_topic partition(dt='$do_date') select t1.activity_rule_id, t1.activity_id, nvl(order_last_1d_count,0), nvl(order_last_1d_reduce_amount,0), nvl(order_last_1d_original_amount,0), nvl(order_last_1d_final_amount,0), nvl(order_count,0), nvl(order_reduce_amount,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_last_1d_count,0), nvl(payment_last_1d_reduce_amount,0), nvl(payment_last_1d_amount,0), nvl(payment_count,0), nvl(payment_reduce_amount,0), nvl(payment_amount,0) from ( select activity_rule_id, activity_id from ${APP}.dim_activity_rule_info where dt='$do_date' )t1 left join ( select activity_rule_id, activity_id, sum(if(dt='$do_date',order_count,0)) order_last_1d_count, sum(if(dt='$do_date',order_reduce_amount,0)) order_last_1d_reduce_amount, sum(if(dt='$do_date',order_original_amount,0)) order_last_1d_original_amount, sum(if(dt='$do_date',order_final_amount,0)) order_last_1d_final_amount, sum(order_count) order_count, sum(order_reduce_amount) order_reduce_amount, sum(order_original_amount) order_original_amount, sum(order_final_amount) order_final_amount, sum(if(dt='$do_date',payment_count,0)) payment_last_1d_count, sum(if(dt='$do_date',payment_reduce_amount,0)) payment_last_1d_reduce_amount, sum(if(dt='$do_date',payment_amount,0)) payment_last_1d_amount, sum(payment_count) payment_count, sum(payment_reduce_amount) payment_reduce_amount, sum(payment_amount) payment_amount from ${APP}.dws_activity_info_daycount group by activity_rule_id,activity_id )t2 on t1.activity_rule_id=t2.activity_rule_id and t1.activity_id=t2.activity_id; " dwt_area_topic=" insert overwrite table ${APP}.dwt_area_topic partition(dt='$do_date') select id, nvl(visit_last_1d_count,0), nvl(login_last_1d_count,0), nvl(visit_last_7d_count,0), nvl(login_last_7d_count,0), nvl(visit_last_30d_count,0), nvl(login_last_30d_count,0), nvl(visit_count,0), nvl(login_count,0), nvl(order_last_1d_count,0), nvl(order_last_1d_original_amount,0), nvl(order_last_1d_final_amount,0), nvl(order_last_7d_count,0), nvl(order_last_7d_original_amount,0), nvl(order_last_7d_final_amount,0), nvl(order_last_30d_count,0), nvl(order_last_30d_original_amount,0), nvl(order_last_30d_final_amount,0), nvl(order_count,0), nvl(order_original_amount,0), nvl(order_final_amount,0), nvl(payment_last_1d_count,0), nvl(payment_last_1d_amount,0), nvl(payment_last_7d_count,0), nvl(payment_last_7d_amount,0), nvl(payment_last_30d_count,0), nvl(payment_last_30d_amount,0), nvl(payment_count,0), nvl(payment_amount,0), nvl(refund_order_last_1d_count,0), nvl(refund_order_last_1d_amount,0), nvl(refund_order_last_7d_count,0), nvl(refund_order_last_7d_amount,0), nvl(refund_order_last_30d_count,0), nvl(refund_order_last_30d_amount,0), nvl(refund_order_count,0), nvl(refund_order_amount,0), nvl(refund_payment_last_1d_count,0), nvl(refund_payment_last_1d_amount,0), nvl(refund_payment_last_7d_count,0), nvl(refund_payment_last_7d_amount,0), nvl(refund_payment_last_30d_count,0), nvl(refund_payment_last_30d_amount,0), nvl(refund_payment_count,0), nvl(refund_payment_amount,0) from ( select id from ${APP}.dim_base_province )t1 left join ( select province_id province_id, sum(if(dt='$do_date',visit_count,0)) visit_last_1d_count, sum(if(dt='$do_date',login_count,0)) login_last_1d_count, sum(if(dt>=date_add('$do_date',-6),visit_count,0)) visit_last_7d_count, sum(if(dt>=date_add('$do_date',-6),login_count,0)) login_last_7d_count, sum(if(dt>=date_add('$do_date',-29),visit_count,0)) visit_last_30d_count, sum(if(dt>=date_add('$do_date',-29),login_count,0)) login_last_30d_count, sum(visit_count) visit_count, sum(login_count) login_count, sum(if(dt='$do_date',order_count,0)) order_last_1d_count, sum(if(dt='$do_date',order_original_amount,0)) order_last_1d_original_amount, sum(if(dt='$do_date',order_final_amount,0)) order_last_1d_final_amount, sum(if(dt>=date_add('$do_date',-6),order_count,0)) order_last_7d_count, sum(if(dt>=date_add('$do_date',-6),order_original_amount,0)) order_last_7d_original_amount, sum(if(dt>=date_add('$do_date',-6),order_final_amount,0)) order_last_7d_final_amount, sum(if(dt>=date_add('$do_date',-29),order_count,0)) order_last_30d_count, sum(if(dt>=date_add('$do_date',-29),order_original_amount,0)) order_last_30d_original_amount, sum(if(dt>=date_add('$do_date',-29),order_final_amount,0)) order_last_30d_final_amount, sum(order_count) order_count, sum(order_original_amount) order_original_amount, sum(order_final_amount) order_final_amount, sum(if(dt='$do_date',payment_count,0)) payment_last_1d_count, sum(if(dt='$do_date',payment_amount,0)) payment_last_1d_amount, sum(if(dt>=date_add('$do_date',-6),payment_count,0)) payment_last_7d_count, sum(if(dt>=date_add('$do_date',-6),payment_amount,0)) payment_last_7d_amount, sum(if(dt>=date_add('$do_date',-29),payment_count,0)) payment_last_30d_count, sum(if(dt>=date_add('$do_date',-29),payment_amount,0)) payment_last_30d_amount, sum(payment_count) payment_count, sum(payment_amount) payment_amount, sum(if(dt='$do_date',refund_order_count,0)) refund_order_last_1d_count, sum(if(dt='$do_date',refund_order_amount,0)) refund_order_last_1d_amount, sum(if(dt>=date_add('$do_date',-6),refund_order_count,0)) refund_order_last_7d_count, sum(if(dt>=date_add('$do_date',-6),refund_order_amount,0)) refund_order_last_7d_amount, sum(if(dt>=date_add('$do_date',-29),refund_order_count,0)) refund_order_last_30d_count, sum(if(dt>=date_add('$do_date',-29),refund_order_amount,0)) refund_order_last_30d_amount, sum(refund_order_count) refund_order_count, sum(refund_order_amount) refund_order_amount, sum(if(dt='$do_date',refund_payment_count,0)) refund_payment_last_1d_count, sum(if(dt='$do_date',refund_payment_amount,0)) refund_payment_last_1d_amount, sum(if(dt>=date_add('$do_date',-6),refund_payment_count,0)) refund_payment_last_7d_count, sum(if(dt>=date_add('$do_date',-6),refund_payment_amount,0)) refund_payment_last_7d_amount, sum(if(dt>=date_add('$do_date',-29),refund_payment_count,0)) refund_payment_last_30d_count, sum(if(dt>=date_add('$do_date',-29),refund_payment_amount,0)) refund_payment_last_30d_amount, sum(refund_payment_count) refund_payment_count, sum(refund_payment_amount) refund_payment_amount from ${APP}.dws_area_stats_daycount group by province_id )t2 on t1.id=t2.province_id; " case $1 in "dwt_visitor_topic" ) hive -e "$dwt_visitor_topic" ;; "dwt_user_topic" ) hive -e "$dwt_user_topic" ;; "dwt_sku_topic" ) hive -e "$dwt_sku_topic" ;; "dwt_activity_topic" ) hive -e "$dwt_activity_topic" ;; "dwt_coupon_topic" ) hive -e "$dwt_coupon_topic" ;; "dwt_area_topic" ) hive -e "$dwt_area_topic" ;; "all" ) hive -e "$dwt_visitor_topic$dwt_user_topic$dwt_sku_topic$dwt_activity_topic$dwt_coupon_topic$dwt_area_topic" ;; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830添加执行权限
执行脚本
dws_to_dwt_init.sh all 2020-06-14
1
# DWT层每日数据导入脚本
在/home/damoncai/bin目录下创建脚本dws_to_dwt.sh
#!/bin/bash APP=gmall # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$2" ] ;then do_date=$2 else do_date=`date -d "-1 day" +%F` fi clear_date=`date -d "$do_date -2 day" +%F` dwt_visitor_topic=" insert overwrite table ${APP}.dwt_visitor_topic partition(dt='$do_date') select nvl(1d_ago.mid_id,old.mid_id), nvl(1d_ago.brand,old.brand), nvl(1d_ago.model,old.model), nvl(1d_ago.channel,old.channel), nvl(1d_ago.os,old.os), nvl(1d_ago.area_code,old.area_code), nvl(1d_ago.version_code,old.version_code), case when old.mid_id is null and 1d_ago.is_new=1 then '$do_date' when old.mid_id is null and 1d_ago.is_new=0 then '2020-06-13'--无法获取准确的首次登录日期,给定一个数仓搭建日之前的日期 else old.visit_date_first end, if(1d_ago.mid_id is not null,'$do_date',old.visit_date_last), nvl(1d_ago.visit_count,0), if(1d_ago.mid_id is null,0,1), nvl(old.visit_last_7d_count,0)+nvl(1d_ago.visit_count,0)- nvl(7d_ago.visit_count,0), nvl(old.visit_last_7d_day_count,0)+if(1d_ago.mid_id is null,0,1)- if(7d_ago.mid_id is null,0,1), nvl(old.visit_last_30d_count,0)+nvl(1d_ago.visit_count,0)- nvl(30d_ago.visit_count,0), nvl(old.visit_last_30d_day_count,0)+if(1d_ago.mid_id is null,0,1)- if(30d_ago.mid_id is null,0,1), nvl(old.visit_count,0)+nvl(1d_ago.visit_count,0), nvl(old.visit_day_count,0)+if(1d_ago.mid_id is null,0,1) from ( select mid_id, brand, model, channel, os, area_code, version_code, visit_date_first, visit_date_last, visit_last_1d_count, visit_last_1d_day_count, visit_last_7d_count, visit_last_7d_day_count, visit_last_30d_count, visit_last_30d_day_count, visit_count, visit_day_count from ${APP}.dwt_visitor_topic where dt=date_add('$do_date',-1) )old full outer join ( select mid_id, brand, model, is_new, channel, os, area_code, version_code, visit_count from ${APP}.dws_visitor_action_daycount where dt='$do_date' )1d_ago on old.mid_id=1d_ago.mid_id left join ( select mid_id, brand, model, is_new, channel, os, area_code, version_code, visit_count from ${APP}.dws_visitor_action_daycount where dt=date_add('$do_date',-7) )7d_ago on old.mid_id=7d_ago.mid_id left join ( select mid_id, brand, model, is_new, channel, os, area_code, version_code, visit_count from ${APP}.dws_visitor_action_daycount where dt=date_add('$do_date',-30) )30d_ago on old.mid_id=30d_ago.mid_id; alter table ${APP}.dwt_visitor_topic drop partition(dt='$clear_date'); " dwt_user_topic=" insert overwrite table ${APP}.dwt_user_topic partition(dt='$do_date') select nvl(1d_ago.user_id,old.user_id), nvl(old.login_date_first,'$do_date'), if(1d_ago.user_id is not null,'$do_date',old.login_date_last), nvl(1d_ago.login_count,0), if(1d_ago.user_id is not null,1,0), nvl(old.login_last_7d_count,0)+nvl(1d_ago.login_count,0)- nvl(7d_ago.login_count,0), nvl(old.login_last_7d_day_count,0)+if(1d_ago.user_id is null,0,1)- if(7d_ago.user_id is null,0,1), nvl(old.login_last_30d_count,0)+nvl(1d_ago.login_count,0)- nvl(30d_ago.login_count,0), nvl(old.login_last_30d_day_count,0)+if(1d_ago.user_id is null,0,1)- if(30d_ago.user_id is null,0,1), nvl(old.login_count,0)+nvl(1d_ago.login_count,0), nvl(old.login_day_count,0)+if(1d_ago.user_id is not null,1,0), if(old.order_date_first is null and 1d_ago.order_count>0, '$do_date', old.order_date_first), if(1d_ago.order_count>0,'$do_date',old.order_date_last), nvl(1d_ago.order_count,0), nvl(1d_ago.order_activity_count,0), nvl(1d_ago.order_activity_reduce_amount,0.0), nvl(1d_ago.order_coupon_count,0), nvl(1d_ago.order_coupon_reduce_amount,0.0), nvl(1d_ago.order_original_amount,0.0), nvl(1d_ago.order_final_amount,0.0), nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0), nvl(old.order_activity_last_7d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(7d_ago.order_activity_count,0), nvl(old.order_activity_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(7d_ago.order_activity_reduce_amount,0.0), nvl(old.order_coupon_last_7d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(7d_ago.order_coupon_count,0), nvl(old.order_coupon_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(7d_ago.order_coupon_reduce_amount,0.0), nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0), nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0), nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0), nvl(old.order_activity_last_30d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(30d_ago.order_activity_count,0), nvl(old.order_activity_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(30d_ago.order_activity_reduce_amount,0.0), nvl(old.order_coupon_last_30d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(30d_ago.order_coupon_count,0), nvl(old.order_coupon_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(30d_ago.order_coupon_reduce_amount,0.0), nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0), nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0), nvl(old.order_count,0)+nvl(1d_ago.order_count,0), nvl(old.order_activity_count,0)+nvl(1d_ago.order_activity_count,0), nvl(old.order_activity_reduce_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0), nvl(old.order_coupon_count,0)+nvl(1d_ago.order_coupon_count,0), nvl(old.order_coupon_reduce_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0), nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0), nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0), if(old.payment_date_first is null and 1d_ago.payment_count>0, '$do_date', old.payment_date_first), if(1d_ago.payment_count>0,'$do_date',old.payment_date_last), nvl(1d_ago.payment_count,0), nvl(1d_ago.payment_amount,0.0), nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)-nvl(7d_ago.payment_count,0), nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)-nvl(7d_ago.payment_amount,0.0), nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)-nvl(30d_ago.payment_count,0), nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0), nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0), nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0), nvl(1d_ago.refund_order_count,0), nvl(1d_ago.refund_order_num,0), nvl(1d_ago.refund_order_amount,0.0), nvl(old.refund_order_last_7d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(7d_ago.refund_order_count,0), nvl(old.refund_order_last_7d_num,0)+nvl(1d_ago.refund_order_num, 0)- nvl(7d_ago.refund_order_num,0), nvl(old.refund_order_last_7d_amount,0.0)+ nvl(1d_ago.refund_order_amount,0.0)- nvl(7d_ago.refund_order_amount,0.0), nvl(old.refund_order_last_30d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(30d_ago.refund_order_count,0), nvl(old.refund_order_last_30d_num,0)+nvl(1d_ago.refund_order_num, 0)- nvl(30d_ago.refund_order_num,0), nvl(old.refund_order_last_30d_amount,0.0)+ nvl(1d_ago.refund_order_amount,0.0)- nvl(30d_ago.refund_order_amount,0.0), nvl(old.refund_order_count,0)+nvl(1d_ago.refund_order_count,0), nvl(old.refund_order_num,0)+nvl(1d_ago.refund_order_num,0), nvl(old.refund_order_amount,0.0)+ nvl(1d_ago.refund_order_amount,0.0), nvl(1d_ago.refund_payment_count,0), nvl(1d_ago.refund_payment_num,0), nvl(1d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_last_7d_count,0)+nvl(1d_ago.refund_payment_count,0)-nvl(7d_ago.refund_payment_count,0), nvl(old.refund_payment_last_7d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(7d_ago.refund_payment_num,0), nvl(old.refund_payment_last_7d_amount,0.0)+ nvl(1d_ago.refund_payment_amount,0.0)- nvl(7d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_last_30d_count,0)+nvl(1d_ago.refund_payment_count,0)-nvl(30d_ago.refund_payment_count,0), nvl(old.refund_payment_last_30d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(30d_ago.refund_payment_num,0), nvl(old.refund_payment_last_30d_amount,0.0)+ nvl(1d_ago.refund_payment_amount,0.0)- nvl(30d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_count,0)+nvl(1d_ago.refund_payment_count,0), nvl(old.refund_payment_num,0)+nvl(1d_ago.refund_payment_num,0), nvl(old.refund_payment_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0), nvl(1d_ago.cart_count,0), nvl(old.cart_last_7d_count,0)+nvl(1d_ago.cart_count,0)-nvl(7d_ago.cart_count,0), nvl(old.cart_last_30d_count,0)+nvl(1d_ago.cart_count,0)-nvl(30d_ago.cart_count,0), nvl(old.cart_count,0)+nvl(1d_ago.cart_count,0), nvl(1d_ago.favor_count,0), nvl(old.favor_last_7d_count,0)+nvl(1d_ago.favor_count,0)- nvl(7d_ago.favor_count,0), nvl(old.favor_last_30d_count,0)+nvl(1d_ago.favor_count,0)- nvl(30d_ago.favor_count,0), nvl(old.favor_count,0)+nvl(1d_ago.favor_count,0), nvl(1d_ago.coupon_get_count,0), nvl(1d_ago.coupon_using_count,0), nvl(1d_ago.coupon_used_count,0), nvl(old.coupon_last_7d_get_count,0)+nvl(1d_ago.coupon_get_count,0)- nvl(7d_ago.coupon_get_count,0), nvl(old.coupon_last_7d_using_count,0)+nvl(1d_ago.coupon_using_count,0)- nvl(7d_ago.coupon_using_count,0), nvl(old.coupon_last_7d_used_count,0)+ nvl(1d_ago.coupon_used_count,0)- nvl(7d_ago.coupon_used_count,0), nvl(old.coupon_last_30d_get_count,0)+nvl(1d_ago.coupon_get_count,0)- nvl(30d_ago.coupon_get_count,0), nvl(old.coupon_last_30d_using_count,0)+nvl(1d_ago.coupon_using_count,0)- nvl(30d_ago.coupon_using_count,0), nvl(old.coupon_last_30d_used_count,0)+ nvl(1d_ago.coupon_used_count,0)- nvl(30d_ago.coupon_used_count,0), nvl(old.coupon_get_count,0)+nvl(1d_ago.coupon_get_count,0), nvl(old.coupon_using_count,0)+nvl(1d_ago.coupon_using_count,0), nvl(old.coupon_used_count,0)+nvl(1d_ago.coupon_used_count,0), nvl(1d_ago.appraise_good_count,0), nvl(1d_ago.appraise_mid_count,0), nvl(1d_ago.appraise_bad_count,0), nvl(old.appraise_last_7d_default_count,0)+nvl(1d_ago.appraise_default_count,0)-nvl(7d_ago.appraise_default_count,0), nvl(old.appraise_last_7d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(7d_ago.appraise_good_count,0), nvl(old.appraise_last_7d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)-nvl(7d_ago.appraise_mid_count,0), nvl(old.appraise_last_7d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)-nvl(7d_ago.appraise_bad_count,0), nvl(old.appraise_last_7d_default_count,0)+nvl(1d_ago.appraise_default_count,0)-nvl(7d_ago.appraise_default_count,0), nvl(old.appraise_last_30d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(30d_ago.appraise_good_count,0), nvl(old.appraise_last_30d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)-nvl(30d_ago.appraise_mid_count,0), nvl(old.appraise_last_30d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)-nvl(30d_ago.appraise_bad_count,0), nvl(old.appraise_last_30d_default_count,0)+nvl(1d_ago.appraise_default_count,0)-nvl(30d_ago.appraise_default_count,0), nvl(old.appraise_good_count,0)+nvl(1d_ago.appraise_good_count,0), nvl(old.appraise_mid_count,0)+nvl(1d_ago.appraise_mid_count, 0), nvl(old.appraise_bad_count,0)+nvl(1d_ago.appraise_bad_count,0), nvl(old.appraise_default_count,0)+nvl(1d_ago.appraise_default_count,0) from ( select user_id, login_date_first, login_date_last, login_date_1d_count, login_last_1d_day_count, login_last_7d_count, login_last_7d_day_count, login_last_30d_count, login_last_30d_day_count, login_count, login_day_count, order_date_first, order_date_last, order_last_1d_count, order_activity_last_1d_count, order_activity_reduce_last_1d_amount, order_coupon_last_1d_count, order_coupon_reduce_last_1d_amount, order_last_1d_original_amount, order_last_1d_final_amount, order_last_7d_count, order_activity_last_7d_count, order_activity_reduce_last_7d_amount, order_coupon_last_7d_count, order_coupon_reduce_last_7d_amount, order_last_7d_original_amount, order_last_7d_final_amount, order_last_30d_count, order_activity_last_30d_count, order_activity_reduce_last_30d_amount, order_coupon_last_30d_count, order_coupon_reduce_last_30d_amount, order_last_30d_original_amount, order_last_30d_final_amount, order_count, order_activity_count, order_activity_reduce_amount, order_coupon_count, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_date_first, payment_date_last, payment_last_1d_count, payment_last_1d_amount, payment_last_7d_count, payment_last_7d_amount, payment_last_30d_count, payment_last_30d_amount, payment_count, payment_amount, refund_order_last_1d_count, refund_order_last_1d_num, refund_order_last_1d_amount, refund_order_last_7d_count, refund_order_last_7d_num, refund_order_last_7d_amount, refund_order_last_30d_count, refund_order_last_30d_num, refund_order_last_30d_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_last_1d_count, refund_payment_last_1d_num, refund_payment_last_1d_amount, refund_payment_last_7d_count, refund_payment_last_7d_num, refund_payment_last_7d_amount, refund_payment_last_30d_count, refund_payment_last_30d_num, refund_payment_last_30d_amount, refund_payment_count, refund_payment_num, refund_payment_amount, cart_last_1d_count, cart_last_7d_count, cart_last_30d_count, cart_count, favor_last_1d_count, favor_last_7d_count, favor_last_30d_count, favor_count, coupon_last_1d_get_count, coupon_last_1d_using_count, coupon_last_1d_used_count, coupon_last_7d_get_count, coupon_last_7d_using_count, coupon_last_7d_used_count, coupon_last_30d_get_count, coupon_last_30d_using_count, coupon_last_30d_used_count, coupon_get_count, coupon_using_count, coupon_used_count, appraise_last_1d_good_count, appraise_last_1d_mid_count, appraise_last_1d_bad_count, appraise_last_1d_default_count, appraise_last_7d_good_count, appraise_last_7d_mid_count, appraise_last_7d_bad_count, appraise_last_7d_default_count, appraise_last_30d_good_count, appraise_last_30d_mid_count, appraise_last_30d_bad_count, appraise_last_30d_default_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from ${APP}.dwt_user_topic where dt=date_add('$do_date',-1) )old full outer join ( select user_id, login_count, cart_count, favor_count, order_count, order_activity_count, order_activity_reduce_amount, order_coupon_count, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, coupon_get_count, coupon_using_count, coupon_used_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from ${APP}.dws_user_action_daycount where dt='$do_date' )1d_ago on old.user_id=1d_ago.user_id left join ( select user_id, login_count, cart_count, favor_count, order_count, order_activity_count, order_activity_reduce_amount, order_coupon_count, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, coupon_get_count, coupon_using_count, coupon_used_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from ${APP}.dws_user_action_daycount where dt=date_add('$do_date',-7) )7d_ago on old.user_id=7d_ago.user_id left join ( select user_id, login_count, cart_count, favor_count, order_count, order_activity_count, order_activity_reduce_amount, order_coupon_count, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, coupon_get_count, coupon_using_count, coupon_used_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from ${APP}.dws_user_action_daycount where dt=date_add('$do_date',-30) )30d_ago on old.user_id=30d_ago.user_id; alter table ${APP}.dwt_user_topic drop partition(dt='$clear_date'); " dwt_sku_topic=" insert overwrite table ${APP}.dwt_sku_topic partition(dt='$do_date') select nvl(1d_ago.sku_id,old.sku_id), nvl(1d_ago.order_count,0), nvl(1d_ago.order_num,0), nvl(1d_ago.order_activity_count,0), nvl(1d_ago.order_coupon_count,0), nvl(1d_ago.order_activity_reduce_amount,0.0), nvl(1d_ago.order_coupon_reduce_amount,0.0), nvl(1d_ago.order_original_amount,0.0), nvl(1d_ago.order_final_amount,0.0), nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0), nvl(old.order_last_7d_num,0)+nvl(1d_ago.order_num,0)- nvl(7d_ago.order_num,0), nvl(old.order_activity_last_7d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(7d_ago.order_activity_count,0), nvl(old.order_coupon_last_7d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(7d_ago.order_coupon_count,0), nvl(old.order_activity_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(7d_ago.order_activity_reduce_amount,0.0), nvl(old.order_coupon_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(7d_ago.order_coupon_reduce_amount,0.0), nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0), nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0), nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0), nvl(old.order_last_30d_num,0)+nvl(1d_ago.order_num,0)- nvl(30d_ago.order_num,0), nvl(old.order_activity_last_30d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(30d_ago.order_activity_count,0), nvl(old.order_coupon_last_30d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(30d_ago.order_coupon_count,0), nvl(old.order_activity_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(30d_ago.order_activity_reduce_amount,0.0), nvl(old.order_coupon_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(30d_ago.order_coupon_reduce_amount,0.0), nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0), nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0), nvl(old.order_count,0)+nvl(1d_ago.order_count,0), nvl(old.order_num,0)+nvl(1d_ago.order_num,0), nvl(old.order_activity_count,0)+nvl(1d_ago.order_activity_count,0), nvl(old.order_coupon_count,0)+nvl(1d_ago.order_coupon_count,0), nvl(old.order_activity_reduce_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0), nvl(old.order_coupon_reduce_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0), nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0), nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0), nvl(1d_ago.payment_count,0), nvl(1d_ago.payment_num,0), nvl(1d_ago.payment_amount,0.0), nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)- nvl(7d_ago.payment_count,0), nvl(old.payment_last_7d_num,0)+nvl(1d_ago.payment_num,0)- nvl(7d_ago.payment_num,0), nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(7d_ago.payment_amount,0.0), nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)- nvl(30d_ago.payment_count,0), nvl(old.payment_last_30d_num,0)+nvl(1d_ago.payment_num,0)- nvl(30d_ago.payment_num,0), nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0), nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0), nvl(old.payment_num,0)+nvl(1d_ago.payment_num,0), nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0), nvl(old.refund_order_last_1d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(1d_ago.refund_order_count,0), nvl(old.refund_order_last_1d_num,0)+nvl(1d_ago.refund_order_num,0)- nvl(1d_ago.refund_order_num,0), nvl(old.refund_order_last_1d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(1d_ago.refund_order_amount,0.0), nvl(old.refund_order_last_7d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(7d_ago.refund_order_count,0), nvl(old.refund_order_last_7d_num,0)+nvl(1d_ago.refund_order_num,0)- nvl(7d_ago.refund_order_num,0), nvl(old.refund_order_last_7d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(7d_ago.refund_order_amount,0.0), nvl(old.refund_order_last_30d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(30d_ago.refund_order_count,0), nvl(old.refund_order_last_30d_num,0)+nvl(1d_ago.refund_order_num,0)- nvl(30d_ago.refund_order_num,0), nvl(old.refund_order_last_30d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(30d_ago.refund_order_amount,0.0), nvl(old.refund_order_count,0)+nvl(1d_ago.refund_order_count,0), nvl(old.refund_order_num,0)+nvl(1d_ago.refund_order_num,0), nvl(old.refund_order_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0), nvl(1d_ago.refund_payment_count,0), nvl(1d_ago.refund_payment_num,0), nvl(1d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_last_7d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(7d_ago.refund_payment_count,0), nvl(old.refund_payment_last_7d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(7d_ago.refund_payment_num,0), nvl(old.refund_payment_last_7d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(7d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_last_30d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(30d_ago.refund_payment_count,0), nvl(old.refund_payment_last_30d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(30d_ago.refund_payment_num,0), nvl(old.refund_payment_last_30d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(30d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_count,0)+nvl(1d_ago.refund_payment_count,0), nvl(old.refund_payment_num,0)+nvl(1d_ago.refund_payment_num,0), nvl(old.refund_payment_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0), nvl(1d_ago.cart_count,0), nvl(old.cart_last_7d_count,0)+nvl(1d_ago.cart_count,0)- nvl(7d_ago.cart_count,0), nvl(old.cart_last_30d_count,0)+nvl(1d_ago.cart_count,0)- nvl(30d_ago.cart_count,0), nvl(old.cart_count,0)+nvl(1d_ago.cart_count,0), nvl(1d_ago.favor_count,0), nvl(old.favor_last_7d_count,0)+nvl(1d_ago.favor_count,0)- nvl(7d_ago.favor_count,0), nvl(old.favor_last_30d_count,0)+nvl(1d_ago.favor_count,0)- nvl(30d_ago.favor_count,0), nvl(old.favor_count,0)+nvl(1d_ago.favor_count,0), nvl(1d_ago.appraise_good_count,0), nvl(1d_ago.appraise_mid_count,0), nvl(1d_ago.appraise_bad_count,0), nvl(1d_ago.appraise_default_count,0), nvl(old.appraise_last_7d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(7d_ago.appraise_good_count,0), nvl(old.appraise_last_7d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)- nvl(7d_ago.appraise_mid_count,0), nvl(old.appraise_last_7d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)- nvl(7d_ago.appraise_bad_count,0), nvl(old.appraise_last_7d_default_count,0)+nvl(1d_ago.appraise_default_count,0)- nvl(7d_ago.appraise_default_count,0), nvl(old.appraise_last_30d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(30d_ago.appraise_good_count,0), nvl(old.appraise_last_30d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)- nvl(30d_ago.appraise_mid_count,0), nvl(old.appraise_last_30d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)- nvl(30d_ago.appraise_bad_count,0), nvl(old.appraise_last_30d_default_count,0)+nvl(1d_ago.appraise_default_count,0)- nvl(30d_ago.appraise_default_count,0), nvl(old.appraise_good_count,0)+nvl(1d_ago.appraise_good_count,0), nvl(old.appraise_mid_count,0)+nvl(1d_ago.appraise_mid_count,0), nvl(old.appraise_bad_count,0)+nvl(1d_ago.appraise_bad_count,0), nvl(old.appraise_default_count,0)+nvl(1d_ago.appraise_default_count,0) from ( select sku_id, order_last_1d_count, order_last_1d_num, order_activity_last_1d_count, order_coupon_last_1d_count, order_activity_reduce_last_1d_amount, order_coupon_reduce_last_1d_amount, order_last_1d_original_amount, order_last_1d_final_amount, order_last_7d_count, order_last_7d_num, order_activity_last_7d_count, order_coupon_last_7d_count, order_activity_reduce_last_7d_amount, order_coupon_reduce_last_7d_amount, order_last_7d_original_amount, order_last_7d_final_amount, order_last_30d_count, order_last_30d_num, order_activity_last_30d_count, order_coupon_last_30d_count, order_activity_reduce_last_30d_amount, order_coupon_reduce_last_30d_amount, order_last_30d_original_amount, order_last_30d_final_amount, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_last_1d_count, payment_last_1d_num, payment_last_1d_amount, payment_last_7d_count, payment_last_7d_num, payment_last_7d_amount, payment_last_30d_count, payment_last_30d_num, payment_last_30d_amount, payment_count, payment_num, payment_amount, refund_order_last_1d_count, refund_order_last_1d_num, refund_order_last_1d_amount, refund_order_last_7d_count, refund_order_last_7d_num, refund_order_last_7d_amount, refund_order_last_30d_count, refund_order_last_30d_num, refund_order_last_30d_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_last_1d_count, refund_payment_last_1d_num, refund_payment_last_1d_amount, refund_payment_last_7d_count, refund_payment_last_7d_num, refund_payment_last_7d_amount, refund_payment_last_30d_count, refund_payment_last_30d_num, refund_payment_last_30d_amount, refund_payment_count, refund_payment_num, refund_payment_amount, cart_last_1d_count, cart_last_7d_count, cart_last_30d_count, cart_count, favor_last_1d_count, favor_last_7d_count, favor_last_30d_count, favor_count, appraise_last_1d_good_count, appraise_last_1d_mid_count, appraise_last_1d_bad_count, appraise_last_1d_default_count, appraise_last_7d_good_count, appraise_last_7d_mid_count, appraise_last_7d_bad_count, appraise_last_7d_default_count, appraise_last_30d_good_count, appraise_last_30d_mid_count, appraise_last_30d_bad_count, appraise_last_30d_default_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from ${APP}.dwt_sku_topic where dt=date_add('$do_date',-1) )old full outer join ( select sku_id, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_num, payment_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, cart_count, favor_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from ${APP}.dws_sku_action_daycount where dt='$do_date' )1d_ago on old.sku_id=1d_ago.sku_id left join ( select sku_id, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_num, payment_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, cart_count, favor_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from ${APP}.dws_sku_action_daycount where dt=date_add('$do_date',-7) )7d_ago on old.sku_id=7d_ago.sku_id left join ( select sku_id, order_count, order_num, order_activity_count, order_coupon_count, order_activity_reduce_amount, order_coupon_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_num, payment_amount, refund_order_count, refund_order_num, refund_order_amount, refund_payment_count, refund_payment_num, refund_payment_amount, cart_count, favor_count, appraise_good_count, appraise_mid_count, appraise_bad_count, appraise_default_count from ${APP}.dws_sku_action_daycount where dt=date_add('$do_date',-30) )30d_ago on old.sku_id=30d_ago.sku_id; alter table ${APP}.dwt_sku_topic drop partition(dt='$clear_date'); " dwt_activity_topic=" insert overwrite table ${APP}.dwt_activity_topic partition(dt='$do_date') select nvl(1d_ago.activity_rule_id,old.activity_rule_id), nvl(1d_ago.activity_id,old.activity_id), nvl(1d_ago.order_count,0), nvl(1d_ago.order_reduce_amount,0.0), nvl(1d_ago.order_original_amount,0.0), nvl(1d_ago.order_final_amount,0.0), nvl(old.order_count,0)+nvl(1d_ago.order_count,0), nvl(old.order_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0), nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0), nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0), nvl(1d_ago.payment_count,0), nvl(1d_ago.payment_reduce_amount,0.0), nvl(1d_ago.payment_amount,0.0), nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0), nvl(old.payment_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0), nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0) from ( select activity_rule_id, activity_id, order_count, order_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_reduce_amount, payment_amount from ${APP}.dwt_activity_topic where dt=date_add('$do_date',-1) )old full outer join ( select activity_rule_id, activity_id, order_count, order_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_reduce_amount, payment_amount from ${APP}.dws_activity_info_daycount where dt='$do_date' )1d_ago on old.activity_rule_id=1d_ago.activity_rule_id; alter table ${APP}.dwt_activity_topic drop partition(dt='$clear_date'); " dwt_coupon_topic=" insert overwrite table ${APP}.dwt_coupon_topic partition(dt='$do_date') select nvl(1d_ago.coupon_id,old.coupon_id), nvl(1d_ago.get_count,0), nvl(old.get_last_7d_count,0)+nvl(1d_ago.get_count,0)- nvl(7d_ago.get_count,0), nvl(old.get_last_30d_count,0)+nvl(1d_ago.get_count,0)- nvl(30d_ago.get_count,0), nvl(old.get_count,0)+nvl(1d_ago.get_count,0), nvl(1d_ago.order_count,0), nvl(1d_ago.order_reduce_amount,0.0), nvl(1d_ago.order_original_amount,0.0), nvl(1d_ago.order_final_amount,0.0), nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0), nvl(old.order_last_7d_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0)- nvl(7d_ago.order_reduce_amount,0.0), nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0), nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0), nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0), nvl(old.order_last_30d_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0)- nvl(30d_ago.order_reduce_amount,0.0), nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0), nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0), nvl(old.order_count,0)+nvl(1d_ago.order_count,0), nvl(old.order_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0), nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0), nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0), nvl(old.payment_last_1d_count,0)+nvl(1d_ago.payment_count,0)- nvl(1d_ago.payment_count,0), nvl(old.payment_last_1d_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0)- nvl(1d_ago.payment_reduce_amount,0.0), nvl(old.payment_last_1d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(1d_ago.payment_amount,0.0), nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)- nvl(7d_ago.payment_count,0), nvl(old.payment_last_7d_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0)- nvl(7d_ago.payment_reduce_amount,0.0), nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(7d_ago.payment_amount,0.0), nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)- nvl(30d_ago.payment_count,0), nvl(old.payment_last_30d_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0)- nvl(30d_ago.payment_reduce_amount,0.0), nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0), nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0), nvl(old.payment_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0), nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0), nvl(1d_ago.expire_count,0), nvl(old.expire_last_7d_count,0)+nvl(1d_ago.expire_count,0)- nvl(7d_ago.expire_count,0), nvl(old.expire_last_30d_count,0)+nvl(1d_ago.expire_count,0)- nvl(30d_ago.expire_count,0), nvl(old.expire_count,0)+nvl(1d_ago.expire_count,0) from ( select coupon_id, get_last_1d_count, get_last_7d_count, get_last_30d_count, get_count, order_last_1d_count, order_last_1d_reduce_amount, order_last_1d_original_amount, order_last_1d_final_amount, order_last_7d_count, order_last_7d_reduce_amount, order_last_7d_original_amount, order_last_7d_final_amount, order_last_30d_count, order_last_30d_reduce_amount, order_last_30d_original_amount, order_last_30d_final_amount, order_count, order_reduce_amount, order_original_amount, order_final_amount, payment_last_1d_count, payment_last_1d_reduce_amount, payment_last_1d_amount, payment_last_7d_count, payment_last_7d_reduce_amount, payment_last_7d_amount, payment_last_30d_count, payment_last_30d_reduce_amount, payment_last_30d_amount, payment_count, payment_reduce_amount, payment_amount, expire_last_1d_count, expire_last_7d_count, expire_last_30d_count, expire_count from ${APP}.dwt_coupon_topic where dt=date_add('$do_date',-1) )old full outer join ( select coupon_id, get_count, order_count, order_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_reduce_amount, payment_amount, expire_count from ${APP}.dws_coupon_info_daycount where dt='$do_date' )1d_ago on old.coupon_id=1d_ago.coupon_id left join ( select coupon_id, get_count, order_count, order_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_reduce_amount, payment_amount, expire_count from ${APP}.dws_coupon_info_daycount where dt=date_add('$do_date',-7) )7d_ago on old.coupon_id=7d_ago.coupon_id left join ( select coupon_id, get_count, order_count, order_reduce_amount, order_original_amount, order_final_amount, payment_count, payment_reduce_amount, payment_amount, expire_count from ${APP}.dws_coupon_info_daycount where dt=date_add('$do_date',-30) )30d_ago on old.coupon_id=30d_ago.coupon_id; alter table ${APP}.dwt_coupon_topic drop partition(dt='$clear_date'); " dwt_area_topic=" insert overwrite table ${APP}.dwt_area_topic partition(dt='$do_date') select nvl(old.province_id, 1d_ago.province_id), nvl(1d_ago.visit_count,0), nvl(1d_ago.login_count,0), nvl(old.visit_last_7d_count,0)+nvl(1d_ago.visit_count,0)- nvl(7d_ago.visit_count,0), nvl(old.login_last_7d_count,0)+nvl(1d_ago.login_count,0)- nvl(7d_ago.login_count,0), nvl(old.visit_last_30d_count,0)+nvl(1d_ago.visit_count,0)- nvl(30d_ago.visit_count,0), nvl(old.login_last_30d_count,0)+nvl(1d_ago.login_count,0)- nvl(30d_ago.login_count,0), nvl(old.visit_count,0)+nvl(1d_ago.visit_count,0), nvl(old.login_count,0)+nvl(1d_ago.login_count,0), nvl(1d_ago.order_count,0), nvl(1d_ago.order_original_amount,0.0), nvl(1d_ago.order_final_amount,0.0), nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0), nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0), nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0), nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0), nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0), nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0), nvl(old.order_count,0)+nvl(1d_ago.order_count,0), nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0), nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0), nvl(1d_ago.payment_count,0), nvl(1d_ago.payment_amount,0.0), nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)- nvl(7d_ago.payment_count,0), nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(7d_ago.payment_amount,0.0), nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)- nvl(30d_ago.payment_count,0), nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0), nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0), nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0), nvl(1d_ago.refund_order_count,0), nvl(1d_ago.refund_order_amount,0.0), nvl(old.refund_order_last_7d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(7d_ago.refund_order_count,0), nvl(old.refund_order_last_7d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(7d_ago.refund_order_amount,0.0), nvl(old.refund_order_last_30d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(30d_ago.refund_order_count,0), nvl(old.refund_order_last_30d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(30d_ago.refund_order_amount,0.0), nvl(old.refund_order_count,0)+nvl(1d_ago.refund_order_count,0), nvl(old.refund_order_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0), nvl(1d_ago.refund_payment_count,0), nvl(1d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_last_7d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(7d_ago.refund_payment_count,0), nvl(old.refund_payment_last_7d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(7d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_last_30d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(30d_ago.refund_payment_count,0), nvl(old.refund_payment_last_30d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(30d_ago.refund_payment_amount,0.0), nvl(old.refund_payment_count,0)+nvl(1d_ago.refund_payment_count,0), nvl(old.refund_payment_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0) from ( select province_id, visit_last_1d_count, login_last_1d_count, visit_last_7d_count, login_last_7d_count, visit_last_30d_count, login_last_30d_count, visit_count, login_count, order_last_1d_count, order_last_1d_original_amount, order_last_1d_final_amount, order_last_7d_count, order_last_7d_original_amount, order_last_7d_final_amount, order_last_30d_count, order_last_30d_original_amount, order_last_30d_final_amount, order_count, order_original_amount, order_final_amount, payment_last_1d_count, payment_last_1d_amount, payment_last_7d_count, payment_last_7d_amount, payment_last_30d_count, payment_last_30d_amount, payment_count, payment_amount, refund_order_last_1d_count, refund_order_last_1d_amount, refund_order_last_7d_count, refund_order_last_7d_amount, refund_order_last_30d_count, refund_order_last_30d_amount, refund_order_count, refund_order_amount, refund_payment_last_1d_count, refund_payment_last_1d_amount, refund_payment_last_7d_count, refund_payment_last_7d_amount, refund_payment_last_30d_count, refund_payment_last_30d_amount, refund_payment_count, refund_payment_amount from ${APP}.dwt_area_topic where dt=date_add('$do_date',-1) )old full outer join ( select province_id, visit_count, login_count, order_count, order_original_amount, order_final_amount, payment_count, payment_amount, refund_order_count, refund_order_amount, refund_payment_count, refund_payment_amount from ${APP}.dws_area_stats_daycount where dt='$do_date' )1d_ago on old.province_id=1d_ago.province_id left join ( select province_id, visit_count, login_count, order_count, order_original_amount, order_final_amount, payment_count, payment_amount, refund_order_count, refund_order_amount, refund_payment_count, refund_payment_amount from ${APP}.dws_area_stats_daycount where dt=date_add('$do_date',-7) )7d_ago on old.province_id= 7d_ago.province_id left join ( select province_id, visit_count, login_count, order_count, order_original_amount, order_final_amount, payment_count, payment_amount, refund_order_count, refund_order_amount, refund_payment_count, refund_payment_amount from ${APP}.dws_area_stats_daycount where dt=date_add('$do_date',-30) )30d_ago on old.province_id= 30d_ago.province_id; alter table ${APP}.dwt_area_topic drop partition(dt='$clear_date'); " case $1 in "dwt_visitor_topic" ) hive -e "$dwt_visitor_topic" hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_visitor_topic/dt=$clear_date ;; "dwt_user_topic" ) hive -e "$dwt_user_topic" hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_user_topic/dt=$clear_date ;; "dwt_sku_topic" ) hive -e "$dwt_sku_topic" hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_sku_topic/dt=$clear_date ;; "dwt_activity_topic" ) hive -e "$dwt_activity_topic" hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_activity_topic/dt=$clear_date ;; "dwt_coupon_topic" ) hive -e "$dwt_coupon_topic" hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_coupon_topic/dt=$clear_date ;; "dwt_area_topic" ) hive -e "$dwt_area_topic" hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_area_topic/dt=$clear_date ;; "all" ) hive -e "$dwt_visitor_topic$dwt_user_topic$dwt_sku_topic$dwt_activity_topic$dwt_coupon_topic$dwt_area_topic" hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_visitor_topic/dt=$clear_date hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_user_topic/dt=$clear_date hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_sku_topic/dt=$clear_date hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_activity_topic/dt=$clear_date hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_coupon_topic/dt=$clear_date hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_area_topic/dt=$clear_date ;; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118增加脚本执行权限
执行脚本
dws_to_dwt.sh 2020-06-14
1
# 数仓搭建-ADS层
ADS层不涉及建模,建表根据具体需求而定
# 访客主题
# 访客统计
该需求为访客综合统计,其中包含若干指标,以下为对每个指标的解释说明。
指标 | 说明 | 对应字段 |
---|---|---|
访客数 | 统计访问人数 | uv_count |
页面停留时长 | 统计所有页面访问记录总时长,以秒为单位 | duration_sec |
平均页面停留时长 | 统计每个会话平均停留时长,以秒为单位 | avg_duration_sec |
页面浏览总数 | 统计所有页面访问记录总数 | page_count |
平均页面浏览数 | 统计每个会话平均浏览页面数 | avg_page_count |
会话总数 | 统计会话总数 | sv_count |
跳出数 | 统计只浏览一个页面的会话个数 | bounce_count |
跳出率 | 只有一个页面的会话的比例 | bounce_rate |
建表语句
DROP TABLE IF EXISTS ads_visit_stats; CREATE EXTERNAL TABLE ads_visit_stats ( `dt` STRING COMMENT '统计日期', `is_new` STRING COMMENT '新老标识,1:新,0:老', `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天', `channel` STRING COMMENT '渠道', `uv_count` BIGINT COMMENT '日活(访问人数)', `duration_sec` BIGINT COMMENT '页面停留总时长', `avg_duration_sec` BIGINT COMMENT '一次会话,页面停留平均时长,单位为描述', `page_count` BIGINT COMMENT '页面总浏览数', `avg_page_count` BIGINT COMMENT '一次会话,页面平均浏览数', `sv_count` BIGINT COMMENT '会话次数', `bounce_count` BIGINT COMMENT '跳出数', `bounce_rate` DECIMAL(16,2) COMMENT '跳出率' ) COMMENT '访客统计' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_visit_stats/';
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17数据装载
对所有页面访问记录进行会话的划分
统计每个会话的浏览时长和浏览页面数
统计上述各指标
insert overwrite table ads_visit_stats select * from ads_visit_stats union select '2020-06-14' dt, is_new, recent_days, channel, count(distinct(mid_id)) uv_count, cast(sum(duration)/1000 as bigint) duration_sec, cast(avg(duration)/1000 as bigint) avg_duration_sec, sum(page_count) page_count, cast(avg(page_count) as bigint) avg_page_count, count(*) sv_count, sum(if(page_count=1,1,0)) bounce_count, cast(sum(if(page_count=1,1,0))/count(*)*100 as decimal(16,2)) bounce_rate from ( select session_id, mid_id, is_new, recent_days, channel, count(*) page_count, sum(during_time) duration from ( select mid_id, channel, recent_days, is_new, last_page_id, page_id, during_time, concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by recent_days,mid_id order by ts)) session_id from ( select mid_id, channel, last_page_id, page_id, during_time, ts, recent_days, if(visit_date_first>=date_add('2020-06-14',-recent_days+1),'1','0') is_new from ( select t1.mid_id, t1.channel, t1.last_page_id, t1.page_id, t1.during_time, t1.dt, t1.ts, t2.visit_date_first from ( select mid_id, channel, last_page_id, page_id, during_time, dt, ts from dwd_page_log where dt>=date_add('2020-06-14',-30) )t1 left join ( select mid_id, visit_date_first from dwt_visitor_topic where dt='2020-06-14' )t2 on t1.mid_id=t2.mid_id )t3 lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('2020-06-14',-recent_days+1) )t4 )t5 group by session_id,mid_id,is_new,recent_days,channel )t6 group by is_new,recent_days,channel;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# 路径分析
建表语句
DROP TABLE IF EXISTS ads_page_path; CREATE EXTERNAL TABLE ads_page_path ( `dt` STRING COMMENT '统计日期', `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天', `source` STRING COMMENT '跳转起始页面ID', `target` STRING COMMENT '跳转终到页面ID', `path_count` BIGINT COMMENT '跳转次数' ) COMMENT '页面浏览路径' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_page_path/';
1
2
3
4
5
6
7
8
9
10
11数据装载
insert overwrite table ads_page_path select * from ads_page_path union select '2020-06-14', recent_days, source, target, count(*) from ( select recent_days, concat('step-',step,':',source) source, concat('step-',step+1,':',target) target from ( select recent_days, page_id source, lead(page_id,1,null) over (partition by recent_days,session_id order by ts) target, row_number() over (partition by recent_days,session_id order by ts) step from ( select recent_days, last_page_id, page_id, ts, concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by mid_id,recent_days order by ts)) session_id from dwd_page_log lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('2020-06-14',-30) and dt>=date_add('2020-06-14',-recent_days+1) )t2 )t3 )t4 group by recent_days,source,target;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# 用户主题
# 用户统计
该需求为用户综合统计,其中包含若干指标,以下为对每个指标的解释说明
指标 | 说明 | 对应字段 |
---|---|---|
新增用户数 | 统计新增注册用户人数 | new_user_count |
新增下单用户数 | 统计新增下单用户人数 | new_order_user_count |
下单总金额 | 统计所有订单总额 | order_final_amount |
下单用户数 | 统计下单用户总数 | order_user_count |
未下单用户数 | 统计活跃但未下单用户数 | no_order_user_count |
建表语句
DROP TABLE IF EXISTS ads_user_total; CREATE EXTERNAL TABLE `ads_user_total` ( `dt` STRING COMMENT '统计日期', `recent_days` BIGINT COMMENT '最近天数,0:累积值,1:最近1天,7:最近7天,30:最近30天', `new_user_count` BIGINT COMMENT '新注册用户数', `new_order_user_count` BIGINT COMMENT '新增下单用户数', `order_final_amount` DECIMAL(16,2) COMMENT '下单总金额', `order_user_count` BIGINT COMMENT '下单用户数', `no_order_user_count` BIGINT COMMENT '未下单用户数(具体指活跃用户中未下单用户)' ) COMMENT '用户统计' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_user_total/';
1
2
3
4
5
6
7
8
9
10
11
12数据导入
insert overwrite table ads_user_total select * from ads_user_total union select '2020-06-14', recent_days, sum(if(login_date_first>=recent_days_ago,1,0)) new_user_count, sum(if(order_date_first>=recent_days_ago,1,0)) new_order_user_count, sum(order_final_amount) order_final_amount, sum(if(order_final_amount>0,1,0)) order_user_count, sum(if(login_date_last>=recent_days_ago and order_final_amount=0,1,0)) no_order_user_count from ( select recent_days, user_id, login_date_first, login_date_last, order_date_first, case when recent_days=0 then order_final_amount when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_final_amount, if(recent_days=0,'1970-01-01',date_add('2020-06-14',-recent_days+1)) recent_days_ago from dwt_user_topic lateral view explode(Array(0,1,7,30)) tmp as recent_days where dt='2020-06-14' )t1 group by recent_days;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
# 用户变动统计
该需求包括两个指标,分别为流失用户数和回流用户数,以下为对两个指标的解释说明
指标 | 说明 | 对应字段 |
---|---|---|
流失用户数 | 之前活跃过的用户,最近一段时间未活跃,就称为流失用户。此处要求统计7日前(只包含7日前当天)活跃,但最近7日未活跃的用户总数。 | user_churn_count |
回流用户数 | 之前的活跃用户,一段时间未活跃(流失),今日又活跃了,就称为回流用户。此处要求统计回流用户总数。 | new_order_user_count |
建表语句
DROP TABLE IF EXISTS ads_user_change; CREATE EXTERNAL TABLE `ads_user_change` ( `dt` STRING COMMENT '统计日期', `user_churn_count` BIGINT COMMENT '流失用户数', `user_back_count` BIGINT COMMENT '回流用户数' ) COMMENT '用户变动统计' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_user_change/';
1
2
3
4
5
6
7
8数据装载
insert overwrite table ads_user_change select * from ads_user_change union select churn.dt, user_churn_count, user_back_count from ( select '2020-06-14' dt, count(*) user_churn_count from dwt_user_topic where dt='2020-06-14' and login_date_last=date_add('2020-06-14',-7) )churn join ( select '2020-06-14' dt, count(*) user_back_count from ( select user_id, login_date_last from dwt_user_topic where dt='2020-06-14' and login_date_last='2020-06-14' )t1 join ( select user_id, login_date_last login_date_previous from dwt_user_topic where dt=date_add('2020-06-14',-1) )t2 on t1.user_id=t2.user_id where datediff(login_date_last,login_date_previous)>=8 )back on churn.dt=back.dt;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# 用户行为漏斗分析
建表语句
DROP TABLE IF EXISTS ads_user_action; CREATE EXTERNAL TABLE `ads_user_action` ( `dt` STRING COMMENT '统计日期', `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天', `home_count` BIGINT COMMENT '浏览首页人数', `good_detail_count` BIGINT COMMENT '浏览商品详情页人数', `cart_count` BIGINT COMMENT '加入购物车人数', `order_count` BIGINT COMMENT '下单人数', `payment_count` BIGINT COMMENT '支付人数' ) COMMENT '漏斗分析' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_user_action/';
1
2
3
4
5
6
7
8
9
10
11
12数据装载
with tmp_page as ( select '2020-06-14' dt, recent_days, sum(if(array_contains(pages,'home'),1,0)) home_count, sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_count from ( select recent_days, mid_id, collect_set(page_id) pages from ( select dt, mid_id, page.page_id from dws_visitor_action_daycount lateral view explode(page_stats) tmp as page where dt>=date_add('2020-06-14',-29) and page.page_id in('home','good_detail') )t1 lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('2020-06-14',-recent_days+1) group by recent_days,mid_id )t2 group by recent_days ), tmp_cop as ( select '2020-06-14' dt, recent_days, sum(if(cart_count>0,1,0)) cart_count, sum(if(order_count>0,1,0)) order_count, sum(if(payment_count>0,1,0)) payment_count from ( select recent_days, user_id, case when recent_days=1 then cart_last_1d_count when recent_days=7 then cart_last_7d_count when recent_days=30 then cart_last_30d_count end cart_count, case when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=1 then payment_last_1d_count when recent_days=7 then payment_last_7d_count when recent_days=30 then payment_last_30d_count end payment_count from dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='2020-06-14' )t1 group by recent_days ) insert overwrite table ads_user_action select * from ads_user_action union select tmp_page.dt, tmp_page.recent_days, home_count, good_detail_count, cart_count, order_count, payment_count from tmp_page join tmp_cop on tmp_page.recent_days=tmp_cop.recent_days;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# 用户留存率
建表语句
DROP TABLE IF EXISTS ads_user_retention; CREATE EXTERNAL TABLE ads_user_retention ( `dt` STRING COMMENT '统计日期', `create_date` STRING COMMENT '用户新增日期', `retention_day` BIGINT COMMENT '截至当前日期留存天数', `retention_count` BIGINT COMMENT '留存用户数量', `new_user_count` BIGINT COMMENT '新增用户数量', `retention_rate` DECIMAL(16,2) COMMENT '留存率' ) COMMENT '用户留存率' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_user_retention/';
1
2
3
4
5
6
7
8
9
10
11数据装载
insert overwrite table ads_user_retention select * from ads_user_retention union select '2020-06-14', login_date_first create_date, datediff('2020-06-14',login_date_first) retention_day, sum(if(login_date_last='2020-06-14',1,0)) retention_count, count(*) new_user_count, cast(sum(if(login_date_last='2020-06-14',1,0))/count(*)*100 as decimal(16,2)) retention_rate from dwt_user_topic where dt='2020-06-14' and login_date_first>=date_add('2020-06-14',-7) and login_date_first<'2020-06-14' group by login_date_first;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
# 商品主题
# 商品统计
建表语句
DROP TABLE IF EXISTS ads_order_spu_stats; CREATE EXTERNAL TABLE `ads_order_spu_stats` ( `dt` STRING COMMENT '统计日期', `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天', `spu_id` STRING COMMENT '商品ID', `spu_name` STRING COMMENT '商品名称', `tm_id` STRING COMMENT '品牌ID', `tm_name` STRING COMMENT '品牌名称', `category3_id` STRING COMMENT '三级品类ID', `category3_name` STRING COMMENT '三级品类名称', `category2_id` STRING COMMENT '二级品类ID', `category2_name` STRING COMMENT '二级品类名称', `category1_id` STRING COMMENT '一级品类ID', `category1_name` STRING COMMENT '一级品类名称', `order_count` BIGINT COMMENT '订单数', `order_amount` DECIMAL(16,2) COMMENT '订单金额' ) COMMENT '商品销售统计' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_order_spu_stats/';
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19数据装载
insert overwrite table ads_order_spu_stats select * from ads_order_spu_stats union select '2020-06-14' dt, recent_days, spu_id, spu_name, tm_id, tm_name, category3_id, category3_name, category2_id, category2_name, category1_id, category1_name, sum(order_count), sum(order_amount) from ( select recent_days, sku_id, case when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_amount from dwt_sku_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='2020-06-14' )t1 left join ( select id, spu_id, spu_name, tm_id, tm_name, category3_id, category3_name, category2_id, category2_name, category1_id, category1_name from dim_sku_info where dt='2020-06-14' )t2 on t1.sku_id=t2.id group by recent_days,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_name;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
# 品牌复购率
品牌复购率是指一段时间内重复购买某品牌的人数与购买过该品牌的人数的比值。重复购买即购买次数大于等于2,购买过即购买次数大于1。
此处要求统计最近1,7,30天的各品牌复购率。
建表语句
DROP TABLE IF EXISTS ads_repeat_purchase; CREATE EXTERNAL TABLE `ads_repeat_purchase` ( `dt` STRING COMMENT '统计日期', `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天', `tm_id` STRING COMMENT '品牌ID', `tm_name` STRING COMMENT '品牌名称', `order_repeat_rate` DECIMAL(16,2) COMMENT '复购率' ) COMMENT '品牌复购率' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_repeat_purchase/';
1
2
3
4
5
6
7
8
9
10数据装载
insert overwrite table ads_repeat_purchase select * from ads_repeat_purchase union select '2020-06-14' dt, recent_days, tm_id, tm_name, cast(sum(if(order_count>=2,1,0))/sum(if(order_count>=1,1,0))*100 as decimal(16,2)) from ( select recent_days, user_id, tm_id, tm_name, sum(order_count) order_count from ( select recent_days, user_id, sku_id, count(*) order_count from dwd_order_detail lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('2020-06-14',-29) and dt>=date_add('2020-06-14',-recent_days+1) group by recent_days, user_id,sku_id )t1 left join ( select id, tm_id, tm_name from dim_sku_info where dt='2020-06-14' )t2 on t1.sku_id=t2.id group by recent_days,user_id,tm_id,tm_name )t3 group by recent_days,tm_id,tm_name;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# 订单主题
# 订单统计
建表语句
DROP TABLE IF EXISTS ads_order_total; CREATE EXTERNAL TABLE `ads_order_total` ( `dt` STRING COMMENT '统计日期', `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天', `order_count` BIGINT COMMENT '订单数', `order_amount` DECIMAL(16,2) COMMENT '订单金额', `order_user_count` BIGINT COMMENT '下单人数' ) COMMENT '订单统计' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_order_total/';
1
2
3
4
5
6
7
8
9
10数据装载
insert overwrite table ads_order_total select * from ads_order_total union select '2020-06-14', recent_days, sum(order_count), sum(order_final_amount) order_final_amount, sum(if(order_final_amount>0,1,0)) order_user_count from ( select recent_days, user_id, case when recent_days=0 then order_count when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=0 then order_final_amount when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_final_amount from dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='2020-06-14' )t1 group by recent_days;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# 各地区订单统计
建表语句
DROP TABLE IF EXISTS ads_order_by_province; CREATE EXTERNAL TABLE `ads_order_by_province` ( `dt` STRING COMMENT '统计日期', `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天', `province_id` STRING COMMENT '省份ID', `province_name` STRING COMMENT '省份名称', `area_code` STRING COMMENT '地区编码', `iso_code` STRING COMMENT '国际标准地区编码', `iso_code_3166_2` STRING COMMENT '国际标准地区编码', `order_count` BIGINT COMMENT '订单数', `order_amount` DECIMAL(16,2) COMMENT '订单金额' ) COMMENT '各地区订单统计' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_order_by_province/';
1
2
3
4
5
6
7
8
9
10
11
12
13
14数据装载
insert overwrite table ads_order_by_province select * from ads_order_by_province union select dt, recent_days, province_id, province_name, area_code, iso_code, iso_3166_2, order_count, order_amount from ( select '2020-06-14' dt, recent_days, province_id, sum(order_count) order_count, sum(order_amount) order_amount from ( select recent_days, province_id, case when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_amount from dwt_area_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='2020-06-14' )t1 group by recent_days,province_id )t2 join dim_base_province t3 on t2.province_id=t3.id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
# 优惠券主题
# 优惠券统计
建表语句
DROP TABLE IF EXISTS ads_coupon_stats; CREATE EXTERNAL TABLE ads_coupon_stats ( `dt` STRING COMMENT '统计日期', `coupon_id` STRING COMMENT '优惠券ID', `coupon_name` STRING COMMENT '优惠券名称', `start_date` STRING COMMENT '发布日期', `rule_name` STRING COMMENT '优惠规则,例如满100元减10元', `get_count` BIGINT COMMENT '领取次数', `order_count` BIGINT COMMENT '使用(下单)次数', `expire_count` BIGINT COMMENT '过期次数', `order_original_amount` DECIMAL(16,2) COMMENT '使用优惠券订单原始金额', `order_final_amount` DECIMAL(16,2) COMMENT '使用优惠券订单最终金额', `reduce_amount` DECIMAL(16,2) COMMENT '优惠金额', `reduce_rate` DECIMAL(16,2) COMMENT '补贴率' ) COMMENT '商品销售统计' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_coupon_stats/';
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17数据装载
insert overwrite table ads_coupon_stats select * from ads_coupon_stats union select '2020-06-14' dt, t1.id, coupon_name, start_date, rule_name, get_count, order_count, expire_count, order_original_amount, order_final_amount, reduce_amount, reduce_rate from ( select id, coupon_name, date_format(start_time,'yyyy-MM-dd') start_date, case when coupon_type='3201' then concat('满',condition_amount,'元减',benefit_amount,'元') when coupon_type='3202' then concat('满',condition_num,'件打', (1-benefit_discount)*10,'折') when coupon_type='3203' then concat('减',benefit_amount,'元') end rule_name from dim_coupon_info where dt='2020-06-14' and date_format(start_time,'yyyy-MM-dd')>=date_add('2020-06-14',-29) )t1 left join ( select coupon_id, get_count, order_count, expire_count, order_original_amount, order_final_amount, order_reduce_amount reduce_amount, cast(order_reduce_amount/order_original_amount as decimal(16,2)) reduce_rate from dwt_coupon_topic where dt='2020-06-14' )t2 on t1.id=t2.coupon_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# 活动主题
# 活动统计
建表语句
DROP TABLE IF EXISTS ads_activity_stats; CREATE EXTERNAL TABLE `ads_activity_stats` ( `dt` STRING COMMENT '统计日期', `activity_id` STRING COMMENT '活动ID', `activity_name` STRING COMMENT '活动名称', `start_date` STRING COMMENT '活动开始日期', `order_count` BIGINT COMMENT '参与活动订单数', `order_original_amount` DECIMAL(16,2) COMMENT '参与活动订单原始金额', `order_final_amount` DECIMAL(16,2) COMMENT '参与活动订单最终金额', `reduce_amount` DECIMAL(16,2) COMMENT '优惠金额', `reduce_rate` DECIMAL(16,2) COMMENT '补贴率' ) COMMENT '商品销售统计' ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' LOCATION '/warehouse/gmall/ads/ads_activity_stats/';
1
2
3
4
5
6
7
8
9
10
11
12
13
14数据装载
insert overwrite table ads_activity_stats select * from ads_activity_stats union select '2020-06-14' dt, t4.activity_id, activity_name, start_date, order_count, order_original_amount, order_final_amount, reduce_amount, reduce_rate from ( select activity_id, activity_name, date_format(start_time,'yyyy-MM-dd') start_date from dim_activity_rule_info where dt='2020-06-14' and date_format(start_time,'yyyy-MM-dd')>=date_add('2020-06-14',-29) group by activity_id,activity_name,start_time )t4 left join ( select activity_id, sum(order_count) order_count, sum(order_original_amount) order_original_amount, sum(order_final_amount) order_final_amount, sum(order_reduce_amount) reduce_amount, cast(sum(order_reduce_amount)/sum(order_original_amount)*100 as decimal(16,2)) reduce_rate from dwt_activity_topic where dt='2020-06-14' group by activity_id )t5 on t4.activity_id=t5.activity_id;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# ADS层业务数据导入脚本
在/home/damoncai/bin目录下创建脚本dwt_to_ads.sh
#!/bin/bash APP=gmall # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天 if [ -n "$2" ] ;then do_date=$2 else do_date=`date -d "-1 day" +%F` fi ads_activity_stats=" insert overwrite table ${APP}.ads_activity_stats select * from ${APP}.ads_activity_stats union select '$do_date' dt, t4.activity_id, activity_name, start_date, order_count, order_original_amount, order_final_amount, reduce_amount, reduce_rate from ( select activity_id, activity_name, date_format(start_time,'yyyy-MM-dd') start_date from ${APP}.dim_activity_rule_info where dt='$do_date' and date_format(start_time,'yyyy-MM-dd')>=date_add('$do_date',-29) group by activity_id,activity_name,start_time )t4 left join ( select activity_id, sum(order_count) order_count, sum(order_original_amount) order_original_amount, sum(order_final_amount) order_final_amount, sum(order_reduce_amount) reduce_amount, cast(sum(order_reduce_amount)/sum(order_original_amount)*100 as decimal(16,2)) reduce_rate from ${APP}.dwt_activity_topic where dt='$do_date' group by activity_id )t5 on t4.activity_id=t5.activity_id; " ads_coupon_stats=" insert overwrite table ${APP}.ads_coupon_stats select * from ${APP}.ads_coupon_stats union select '$do_date' dt, t1.id, coupon_name, start_date, rule_name, get_count, order_count, expire_count, order_original_amount, order_final_amount, reduce_amount, reduce_rate from ( select id, coupon_name, date_format(start_time,'yyyy-MM-dd') start_date, case when coupon_type='3201' then concat('满',condition_amount,'元减',benefit_amount,'元') when coupon_type='3202' then concat('满',condition_num,'件打', (1-benefit_discount)*10,'折') when coupon_type='3203' then concat('减',benefit_amount,'元') end rule_name from ${APP}.dim_coupon_info where dt='$do_date' and date_format(start_time,'yyyy-MM-dd')>=date_add('$do_date',-29) )t1 left join ( select coupon_id, get_count, order_count, expire_count, order_original_amount, order_final_amount, order_reduce_amount reduce_amount, cast(order_reduce_amount/order_original_amount as decimal(16,2)) reduce_rate from ${APP}.dwt_coupon_topic where dt='$do_date' )t2 on t1.id=t2.coupon_id; " ads_order_by_province=" insert overwrite table ${APP}.ads_order_by_province select * from ${APP}.ads_order_by_province union select dt, recent_days, province_id, province_name, area_code, iso_code, iso_3166_2, order_count, order_amount from ( select '$do_date' dt, recent_days, province_id, sum(order_count) order_count, sum(order_amount) order_amount from ( select recent_days, province_id, case when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_amount from ${APP}.dwt_area_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='$do_date' )t1 group by recent_days,province_id )t2 join ${APP}.dim_base_province t3 on t2.province_id=t3.id; " ads_order_spu_stats=" insert overwrite table ${APP}.ads_order_spu_stats select * from ${APP}.ads_order_spu_stats union select '$do_date' dt, recent_days, spu_id, spu_name, tm_id, tm_name, category3_id, category3_name, category2_id, category2_name, category1_id, category1_name, sum(order_count), sum(order_amount) from ( select recent_days, sku_id, case when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_amount from ${APP}.dwt_sku_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='$do_date' )t1 left join ( select id, spu_id, spu_name, tm_id, tm_name, category3_id, category3_name, category2_id, category2_name, category1_id, category1_name from ${APP}.dim_sku_info where dt='$do_date' )t2 on t1.sku_id=t2.id group by recent_days,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_name; " ads_order_total=" insert overwrite table ${APP}.ads_order_total select * from ${APP}.ads_order_total union select '$do_date', recent_days, sum(order_count), sum(order_final_amount) order_final_amount, sum(if(order_final_amount>0,1,0)) order_user_count from ( select recent_days, user_id, case when recent_days=0 then order_count when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=0 then order_final_amount when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_final_amount from ${APP}.dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='$do_date' )t1 group by recent_days; " ads_page_path=" insert overwrite table ${APP}.ads_page_path select * from ${APP}.ads_page_path union select '$do_date', recent_days, source, target, count(*) from ( select recent_days, concat('step-',step,':',source) source, concat('step-',step+1,':',target) target from ( select recent_days, page_id source, lead(page_id,1,null) over (partition by recent_days,session_id order by ts) target, row_number() over (partition by recent_days,session_id order by ts) step from ( select recent_days, last_page_id, page_id, ts, concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by mid_id,recent_days order by ts)) session_id from ${APP}.dwd_page_log lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('$do_date',-30) and dt>=date_add('$do_date',-recent_days+1) )t2 )t3 )t4 group by recent_days,source,target; " ads_repeat_purchase=" insert overwrite table ${APP}.ads_repeat_purchase select * from ${APP}.ads_repeat_purchase union select '$do_date' dt, recent_days, tm_id, tm_name, cast(sum(if(order_count>=2,1,0))/sum(if(order_count>=1,1,0))*100 as decimal(16,2)) from ( select recent_days, user_id, tm_id, tm_name, sum(order_count) order_count from ( select recent_days, user_id, sku_id, count(*) order_count from ${APP}.dwd_order_detail lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('$do_date',-29) and dt>=date_add('$do_date',-recent_days+1) group by recent_days, user_id,sku_id )t1 left join ( select id, tm_id, tm_name from ${APP}.dim_sku_info where dt='$do_date' )t2 on t1.sku_id=t2.id group by recent_days,user_id,tm_id,tm_name )t3 group by recent_days,tm_id,tm_name; " ads_user_action=" with tmp_page as ( select '$do_date' dt, recent_days, sum(if(array_contains(pages,'home'),1,0)) home_count, sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_count from ( select recent_days, mid_id, collect_set(page_id) pages from ( select dt, mid_id, page.page_id from ${APP}.dws_visitor_action_daycount lateral view explode(page_stats) tmp as page where dt>=date_add('$do_date',-29) and page.page_id in('home','good_detail') )t1 lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('$do_date',-recent_days+1) group by recent_days,mid_id )t2 group by recent_days ), tmp_cop as ( select '$do_date' dt, recent_days, sum(if(cart_count>0,1,0)) cart_count, sum(if(order_count>0,1,0)) order_count, sum(if(payment_count>0,1,0)) payment_count from ( select recent_days, user_id, case when recent_days=1 then cart_last_1d_count when recent_days=7 then cart_last_7d_count when recent_days=30 then cart_last_30d_count end cart_count, case when recent_days=1 then order_last_1d_count when recent_days=7 then order_last_7d_count when recent_days=30 then order_last_30d_count end order_count, case when recent_days=1 then payment_last_1d_count when recent_days=7 then payment_last_7d_count when recent_days=30 then payment_last_30d_count end payment_count from ${APP}.dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days where dt='$do_date' )t1 group by recent_days ) insert overwrite table ${APP}.ads_user_action select * from ${APP}.ads_user_action union select tmp_page.dt, tmp_page.recent_days, home_count, good_detail_count, cart_count, order_count, payment_count from tmp_page join tmp_cop on tmp_page.recent_days=tmp_cop.recent_days; " ads_user_change=" insert overwrite table ${APP}.ads_user_change select * from ${APP}.ads_user_change union select churn.dt, user_churn_count, user_back_count from ( select '$do_date' dt, count(*) user_churn_count from ${APP}.dwt_user_topic where dt='$do_date' and login_date_last=date_add('$do_date',-7) )churn join ( select '$do_date' dt, count(*) user_back_count from ( select user_id, login_date_last from ${APP}.dwt_user_topic where dt='$do_date' and login_date_last='$do_date' )t1 join ( select user_id, login_date_last login_date_previous from ${APP}.dwt_user_topic where dt=date_add('$do_date',-1) )t2 on t1.user_id=t2.user_id where datediff(login_date_last,login_date_previous)>=8 )back on churn.dt=back.dt; " ads_user_retention=" insert overwrite table ${APP}.ads_user_retention select * from ${APP}.ads_user_retention union select '$do_date', login_date_first create_date, datediff('$do_date',login_date_first) retention_day, sum(if(login_date_last='$do_date',1,0)) retention_count, count(*) new_user_count, cast(sum(if(login_date_last='$do_date',1,0))/count(*)*100 as decimal(16,2)) retention_rate from ${APP}.dwt_user_topic where dt='$do_date' and login_date_first>=date_add('$do_date',-7) and login_date_first<'$do_date' group by login_date_first; " ads_user_total=" insert overwrite table ${APP}.ads_user_total select * from ${APP}.ads_user_total union select '$do_date', recent_days, sum(if(login_date_first>=recent_days_ago,1,0)) new_user_count, sum(if(order_date_first>=recent_days_ago,1,0)) new_order_user_count, sum(order_final_amount) order_final_amount, sum(if(order_final_amount>0,1,0)) order_user_count, sum(if(login_date_last>=recent_days_ago and order_final_amount=0,1,0)) no_order_user_count from ( select recent_days, user_id, login_date_first, login_date_last, order_date_first, case when recent_days=0 then order_final_amount when recent_days=1 then order_last_1d_final_amount when recent_days=7 then order_last_7d_final_amount when recent_days=30 then order_last_30d_final_amount end order_final_amount, if(recent_days=0,'1970-01-01',date_add('$do_date',-recent_days+1)) recent_days_ago from ${APP}.dwt_user_topic lateral view explode(Array(0,1,7,30)) tmp as recent_days where dt='$do_date' )t1 group by recent_days; " ads_visit_stats=" insert overwrite table ${APP}.ads_visit_stats select * from ${APP}.ads_visit_stats union select '$do_date' dt, is_new, recent_days, channel, count(distinct(mid_id)) uv_count, cast(sum(duration)/1000 as bigint) duration_sec, cast(avg(duration)/1000 as bigint) avg_duration_sec, sum(page_count) page_count, cast(avg(page_count) as bigint) avg_page_count, count(*) sv_count, sum(if(page_count=1,1,0)) bounce_count, cast(sum(if(page_count=1,1,0))/count(*)*100 as decimal(16,2)) bounce_rate from ( select session_id, mid_id, is_new, recent_days, channel, count(*) page_count, sum(during_time) duration from ( select mid_id, channel, recent_days, is_new, last_page_id, page_id, during_time, concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by recent_days,mid_id order by ts)) session_id from ( select mid_id, channel, last_page_id, page_id, during_time, ts, recent_days, if(visit_date_first>=date_add('$do_date',-recent_days+1),'1','0') is_new from ( select t1.mid_id, t1.channel, t1.last_page_id, t1.page_id, t1.during_time, t1.dt, t1.ts, t2.visit_date_first from ( select mid_id, channel, last_page_id, page_id, during_time, dt, ts from ${APP}.dwd_page_log where dt>=date_add('$do_date',-30) )t1 left join ( select mid_id, visit_date_first from ${APP}.dwt_visitor_topic where dt='$do_date' )t2 on t1.mid_id=t2.mid_id )t3 lateral view explode(Array(1,7,30)) tmp as recent_days where dt>=date_add('$do_date',-recent_days+1) )t4 )t5 group by session_id,mid_id,is_new,recent_days,channel )t6 group by is_new,recent_days,channel; " case $1 in "ads_activity_stats" ) hive -e "$ads_activity_stats" ;; "ads_coupon_stats" ) hive -e "$ads_coupon_stats" ;; "ads_order_by_province" ) hive -e "$ads_order_by_province" ;; "ads_order_spu_stats" ) hive -e "$ads_order_spu_stats" ;; "ads_order_total" ) hive -e "$ads_order_total" ;; "ads_page_path" ) hive -e "$ads_page_path" ;; "ads_repeat_purchase" ) hive -e "$ads_repeat_purchase" ;; "ads_user_action" ) hive -e "$ads_user_action" ;; "ads_user_change" ) hive -e "$ads_user_change" ;; "ads_user_retention" ) hive -e "$ads_user_retention" ;; "ads_user_total" ) hive -e "$ads_user_total" ;; "ads_visit_stats" ) hive -e "$ads_visit_stats" ;; "all" ) hive -e "$ads_activity_stats$ads_coupon_stats$ads_order_by_province$ads_order_spu_stats$ads_order_total$ads_page_path$ads_repeat_purchase$ads_user_action$ads_user_change$ads_user_retention$ads_user_total$ads_visit_stats" ;; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626增加脚本执行权限
执行脚本
dwt_to_ads.sh all 2020-06-14
1
# Azkaban
# 安装使用
# 创建数据库和表
创建gmall_report数据库
创建表
DROP TABLE IF EXISTS ads_visit_stats; CREATE TABLE `ads_visit_stats` ( `dt` DATE NOT NULL COMMENT '统计日期', `is_new` VARCHAR(255) NOT NULL COMMENT '新老标识,1:新,0:老', `recent_days` INT NOT NULL COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天', `channel` VARCHAR(255) NOT NULL COMMENT '渠道', `uv_count` BIGINT(20) DEFAULT NULL COMMENT '日活(访问人数)', `duration_sec` BIGINT(20) DEFAULT NULL COMMENT '页面停留总时长', `avg_duration_sec` BIGINT(20) DEFAULT NULL COMMENT '一次会话,页面停留平均时长', `page_count` BIGINT(20) DEFAULT NULL COMMENT '页面总浏览数', `avg_page_count` BIGINT(20) DEFAULT NULL COMMENT '一次会话,页面平均浏览数', `sv_count` BIGINT(20) DEFAULT NULL COMMENT '会话次数', `bounce_count` BIGINT(20) DEFAULT NULL COMMENT '跳出数', `bounce_rate` DECIMAL(16,2) DEFAULT NULL COMMENT '跳出率', PRIMARY KEY (`dt`,`recent_days`,`is_new`,`channel`) ) ENGINE=INNODB DEFAULT CHARSET=utf8; DROP TABLE IF EXISTS ads_page_path; CREATE TABLE `ads_page_path` ( `dt` DATE NOT NULL COMMENT '统计日期', `recent_days` BIGINT(20) NOT NULL COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天', `source` VARCHAR(255) DEFAULT NULL COMMENT '跳转起始页面', `target` VARCHAR(255) DEFAULT NULL COMMENT '跳转终到页面', `path_count` BIGINT(255) DEFAULT NULL COMMENT '跳转次数', UNIQUE KEY (`dt`,`recent_days`,`source`,`target`) USING BTREE ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC; DROP TABLE IF EXISTS ads_user_total; CREATE TABLE `ads_user_total` ( `dt` DATE NOT NULL COMMENT '统计日期', `recent_days` BIGINT(20) NOT NULL COMMENT '最近天数,0:累积值,1:最近1天,7:最近7天,30:最近30天', `new_user_count` BIGINT(20) DEFAULT NULL COMMENT '新注册用户数', `new_order_user_count` BIGINT(20) DEFAULT NULL COMMENT '新增下单用户数', `order_final_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '下单总金额', `order_user_count` BIGINT(20) DEFAULT NULL COMMENT '下单用户数', `no_order_user_count` BIGINT(20) DEFAULT NULL COMMENT '未下单用户数(具体指活跃用户中未下单用户)', PRIMARY KEY (`dt`,`recent_days`) ) ENGINE=INNODB DEFAULT CHARSET=utf8; DROP TABLE IF EXISTS ads_user_change; CREATE TABLE `ads_user_change` ( `dt` DATE NOT NULL COMMENT '统计日期', `user_churn_count` BIGINT(20) DEFAULT NULL COMMENT '流失用户数', `user_back_count` BIGINT(20) DEFAULT NULL COMMENT '回流用户数', PRIMARY KEY (`dt`) ) ENGINE=INNODB DEFAULT CHARSET=utf8; DROP TABLE IF EXISTS ads_user_action; CREATE TABLE `ads_user_action` ( `dt` DATE NOT NULL COMMENT '统计日期', `recent_days` BIGINT(20) NOT NULL COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天', `home_count` BIGINT(20) DEFAULT NULL COMMENT '浏览首页人数', `good_detail_count` BIGINT(20) DEFAULT NULL COMMENT '浏览商品详情页人数', `cart_count` BIGINT(20) DEFAULT NULL COMMENT '加入购物车人数', `order_count` BIGINT(20) DEFAULT NULL COMMENT '下单人数', `payment_count` BIGINT(20) DEFAULT NULL COMMENT '支付人数', PRIMARY KEY (`dt`,`recent_days`) USING BTREE ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC; DROP TABLE IF EXISTS ads_user_retention; CREATE TABLE `ads_user_retention` ( `dt` DATE DEFAULT NULL COMMENT '统计日期', `create_date` VARCHAR(255) NOT NULL COMMENT '用户新增日期', `retention_day` BIGINT(20) NOT NULL COMMENT '截至当前日期留存天数', `retention_count` BIGINT(20) DEFAULT NULL COMMENT '留存用户数量', `new_user_count` BIGINT(20) DEFAULT NULL COMMENT '新增用户数量', `retention_rate` DECIMAL(16,2) DEFAULT NULL COMMENT '留存率', PRIMARY KEY (`create_date`,`retention_day`) USING BTREE ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC; DROP TABLE IF EXISTS ads_order_total; CREATE TABLE `ads_order_total` ( `dt` DATE NOT NULL COMMENT '统计日期', `recent_days` BIGINT(20) NOT NULL COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天', `order_count` BIGINT(255) DEFAULT NULL COMMENT '订单数', `order_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '订单金额', `order_user_count` BIGINT(255) DEFAULT NULL COMMENT '下单人数', PRIMARY KEY (`dt`,`recent_days`) ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC; DROP TABLE IF EXISTS ads_order_by_province; CREATE TABLE `ads_order_by_province` ( `dt` DATE NOT NULL, `recent_days` BIGINT(20) NOT NULL COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天', `province_id` VARCHAR(255) NOT NULL COMMENT '统计日期', `province_name` VARCHAR(255) DEFAULT NULL COMMENT '省份名称', `area_code` VARCHAR(255) DEFAULT NULL COMMENT '地区编码', `iso_code` VARCHAR(255) DEFAULT NULL COMMENT '国际标准地区编码', `iso_code_3166_2` VARCHAR(255) DEFAULT NULL COMMENT '国际标准地区编码', `order_count` BIGINT(20) DEFAULT NULL COMMENT '订单数', `order_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '订单金额', PRIMARY KEY (`dt`, `recent_days` ,`province_id`) USING BTREE ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC; DROP TABLE IF EXISTS ads_repeat_purchase; CREATE TABLE `ads_repeat_purchase` ( `dt` DATE NOT NULL COMMENT '统计日期', `recent_days` BIGINT(20) NOT NULL COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天', `tm_id` VARCHAR(255) NOT NULL COMMENT '品牌ID', `tm_name` VARCHAR(255) DEFAULT NULL COMMENT '品牌名称', `order_repeat_rate` DECIMAL(16,2) DEFAULT NULL COMMENT '复购率', PRIMARY KEY (`dt` ,`recent_days`,`tm_id`) ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC; DROP TABLE IF EXISTS ads_order_spu_stats; CREATE TABLE `ads_order_spu_stats` ( `dt` DATE NOT NULL COMMENT '统计日期', `recent_days` BIGINT(20) NOT NULL COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天', `spu_id` VARCHAR(255) NOT NULL COMMENT '商品ID', `spu_name` VARCHAR(255) DEFAULT NULL COMMENT '商品名称', `tm_id` VARCHAR(255) NOT NULL COMMENT '品牌ID', `tm_name` VARCHAR(255) DEFAULT NULL COMMENT '品牌名称', `category3_id` VARCHAR(255) NOT NULL COMMENT '三级品类ID', `category3_name` VARCHAR(255) DEFAULT NULL COMMENT '三级品类名称', `category2_id` VARCHAR(255) NOT NULL COMMENT '二级品类ID', `category2_name` VARCHAR(255) DEFAULT NULL COMMENT '二级品类名称', `category1_id` VARCHAR(255) NOT NULL COMMENT '一级品类ID', `category1_name` VARCHAR(255) NOT NULL COMMENT '一级品类名称', `order_count` BIGINT(20) DEFAULT NULL COMMENT '订单数', `order_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '订单金额', PRIMARY KEY (`dt`,`recent_days`,`spu_id`) ) ENGINE=INNODB DEFAULT CHARSET=utf8; DROP TABLE IF EXISTS ads_activity_stats; CREATE TABLE `ads_activity_stats` ( `dt` DATE NOT NULL COMMENT '统计日期', `activity_id` VARCHAR(255) NOT NULL COMMENT '活动ID', `activity_name` VARCHAR(255) DEFAULT NULL COMMENT '活动名称', `start_date` DATE DEFAULT NULL COMMENT '开始日期', `order_count` BIGINT(11) DEFAULT NULL COMMENT '参与活动订单数', `order_original_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '参与活动订单原始金额', `order_final_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '参与活动订单最终金额', `reduce_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '优惠金额', `reduce_rate` DECIMAL(16,2) DEFAULT NULL COMMENT '补贴率', PRIMARY KEY (`dt`,`activity_id` ) ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC; DROP TABLE IF EXISTS ads_coupon_stats; CREATE TABLE `ads_coupon_stats` ( `dt` DATE NOT NULL COMMENT '统计日期', `coupon_id` VARCHAR(255) NOT NULL COMMENT '优惠券ID', `coupon_name` VARCHAR(255) DEFAULT NULL COMMENT '优惠券名称', `start_date` DATE DEFAULT NULL COMMENT '开始日期', `rule_name` VARCHAR(200) DEFAULT NULL COMMENT '优惠规则', `get_count` BIGINT(20) DEFAULT NULL COMMENT '领取次数', `order_count` BIGINT(20) DEFAULT NULL COMMENT '使用(下单)次数', `expire_count` BIGINT(20) DEFAULT NULL COMMENT '过期次数', `order_original_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '使用优惠券订单原始金额', `order_final_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '使用优惠券订单最终金额', `reduce_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '优惠金额', `reduce_rate` DECIMAL(16,2) DEFAULT NULL COMMENT '补贴率', PRIMARY KEY (`dt`,`coupon_id` ) ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
# Sqoop导出脚本
在/home/damoncai/bin目录下创建脚本hdfs_to_mysql.sh
#!/bin/bash hive_db_name=gmall mysql_db_name=gmall_report export_data() { /opt/module/sqoop/bin/sqoop export \ --connect "jdbc:mysql://ha01:3306/${mysql_db_name}?useUnicode=true&characterEncoding=utf-8" \ --username root \ --password 000000 \ --table $1 \ --num-mappers 1 \ --export-dir /warehouse/$hive_db_name/ads/$1 \ --input-fields-terminated-by "\t" \ --update-mode allowinsert \ --update-key $2 \ --input-null-string '\\N' \ --input-null-non-string '\\N' } case $1 in "ads_activity_stats" ) export_data "ads_activity_stats" "dt,activity_id" ;; "ads_coupon_stats" ) export_data "ads_coupon_stats" "dt,coupon_id" ;; "ads_order_by_province" ) export_data "ads_order_by_province" "dt,recent_days,province_id" ;; "ads_order_spu_stats" ) export_data "ads_order_spu_stats" "dt,recent_days,spu_id" ;; "ads_order_total" ) export_data "ads_order_total" "dt,recent_days" ;; "ads_page_path" ) export_data "ads_page_path" "dt,recent_days,source,target" ;; "ads_repeat_purchase" ) export_data "ads_repeat_purchase" "dt,recent_days,tm_id" ;; "ads_user_action" ) export_data "ads_user_action" "dt,recent_days" ;; "ads_user_change" ) export_data "ads_user_change" "dt" ;; "ads_user_retention" ) export_data "ads_user_retention" "create_date,retention_day" ;; "ads_user_total" ) export_data "ads_user_total" "dt,recent_days" ;; "ads_visit_stats" ) export_data "ads_visit_stats" "dt,recent_days,is_new,channel" ;; "all" ) export_data "ads_activity_stats" "dt,activity_id" export_data "ads_coupon_stats" "dt,coupon_id" export_data "ads_order_by_province" "dt,recent_days,province_id" export_data "ads_order_spu_stats" "dt,recent_days,spu_id" export_data "ads_order_total" "dt,recent_days" export_data "ads_page_path" "dt,recent_days,source,target" export_data "ads_repeat_purchase" "dt,recent_days,tm_id" export_data "ads_user_action" "dt,recent_days" export_data "ads_user_change" "dt" export_data "ads_user_retention" "create_date,retention_day" export_data "ads_user_total" "dt,recent_days" export_data "ads_visit_stats" "dt,recent_days,is_new,channel" ;; esac
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83关于导出update还是insert的问题
--update-mode:
updateonly 只更新,无法插入新数据
allowinsert 允许新增
--update-key:允许更新的情况下,指定哪些字段匹配视为同一条数据,进行更新而不增加。多个字段用逗号分隔。
--input-null-string和--input-null-non-string:
分别表示,将字符串列和非字符串列的空串和“null”转义。
Hive中的Null在底层是以“\N”来存储,而MySQL中的Null在底层就是Null,为了保证数据两端的一致性。在导出数据时采用--input-null-string和--input-null-non-string两个参数。导入数据时采用--null-string和--null-non-string。
添加执行权限
执行sqoop脚本
hdfs_to_mysql.sh all
1
# 全调度流程
# 数据准备
# 用户行为数据
修改/opt/module/applog下的application.properties
#业务日期 mock.date=2020-06-15
1
2注意:分发至其他需要生成数据的节点
xsync application.properties
1生成数据
lg.sh
1注意:生成数据之后,记得查看HDFS数据是否存在!
# 业务数据准备
修改/opt/module/db_log下的application.properties
#业务日期 mock.date=2020-06-15
1
2生成数据
java -jar gmall2020-mock-db-2020-04-01.jar
1查询order_infor表中operate_time中有2020-06-15日期的数据
# 编写Azkaban工作流程配置文件
编写azkaban.project文件,内容如下
azkaban-flow-version: 2.0
1编写gmall.flow文件,内容如下
nodes: - name: mysql_to_hdfs type: command config: command: /home/damoncai/bin/mysql_to_hdfs.sh all ${dt} - name: hdfs_to_ods_log type: command config: command: /home/damoncai/bin/hdfs_to_ods_log.sh ${dt} - name: hdfs_to_ods_db type: command dependsOn: - mysql_to_hdfs config: command: /home/damoncai/bin/hdfs_to_ods_db.sh all ${dt} - name: ods_to_dim_db type: command dependsOn: - hdfs_to_ods_db config: command: /home/damoncai/bin/ods_to_dim_db.sh all ${dt} - name: ods_to_dwd_log type: command dependsOn: - hdfs_to_ods_log config: command: /home/damoncai/bin/ods_to_dwd_log.sh all ${dt} - name: ods_to_dwd_db type: command dependsOn: - hdfs_to_ods_db config: command: /home/damoncai/bin/ods_to_dwd_db.sh all ${dt} - name: dwd_to_dws type: command dependsOn: - ods_to_dim_db - ods_to_dwd_log - ods_to_dwd_db config: command: /home/damoncai/bin/dwd_to_dws.sh all ${dt} - name: dws_to_dwt type: command dependsOn: - dwd_to_dws config: command: /home/damoncai/bin/dws_to_dwt.sh all ${dt} - name: dwt_to_ads type: command dependsOn: - dws_to_dwt config: command: /home/damoncai/bin/dwt_to_ads.sh all ${dt} - name: hdfs_to_mysql type: command dependsOn: - dwt_to_ads config: command: /home/damoncai/bin/hdfs_to_mysql.sh all
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68将azkaban.project、gmall.flow文件压缩到一个zip文件,文件名称必须是英文。(gmall.zip)
在WebServer新建项目:http://ha01:8081/index
gmall.zip文件上传
查看任务流
详细任务流展示
配置输入dt时间参数
Mysql上查看数据
# Azkaban多Executor模式下注意事项
Azkaban多Executor模式是指,在集群中多个节点部署Executor。在这种模式下, Azkaban web Server会根据策略,选取其中一个Executor去执行任务。
由于我们需要交给Azkaban调度的脚本,以及脚本需要的Hive,Sqoop等应用只在hadoop102部署了,为保证任务顺利执行,我们须在以下两种方案任选其一,推荐使用方案二。
方案一:指定特定的Executor(ha01)去执行任务。
在MySQL中azkaban数据库executors表中,查询ha01上的Executor的id。
mysql> use azkaban; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> select * from executors; +----+-----------+-------+--------+ | id | host | port | active | +----+-----------+-------+--------+ | 1 | hadoop103 | 35985 | 1 | | 2 | hadoop104 | 36363 | 1 | | 3 | hadoop102 | 12321 | 1 | +----+-----------+-------+--------+
1
2
3
4
5
6
7
8
9
10
11
12
13在执行工作流程时加入useExecutor属性,如下
方案二:在Executor所在所有节点部署任务所需脚本和应用
分发脚本、sqoop、spark、my_env.sh
xsync /home/atguigu/bin/ xsync /opt/module/hive xsync /opt/module/sqoop xsync /opt/module/spark sudo /home/atguigu/bin/xsync /etc/profile.d/my_env.sh
1
2
3
4
5分发之后,在ha02,ha03重新加载环境变量配置文件,并重启Azkaban