数仓4.0

1/6/2022 数仓

# 数据架构

# 数据生成模块

# 服务器

ha01(192.168.220.201) ha02(192.168.220.202) ha03(192.168.220.203)

# ssh免密登录

三台机器上执行如下代码:

ssh-keygen -t rsa

ssh-copy-id 192.168.220.201
ssh-copy-id 192.168.220.202
ssh-copy-id 192.168.220.203
1
2
3
4
5

# 集群分发脚本

  1. 在用的家目录/home/damoncai下创建bin文件夹

    mkdir bin
    
    1
  2. 创建脚本文件

    cd /home/atguigu/bin
    vim xsync
    
    #!/bin/bash
    #1. 判断参数个数
    if [ $# -lt 1 ]
    then
      echo Not Enough Arguement!
      exit;
    fi
    #2. 遍历集群所有机器
    for host in 192.168.220.201 192.168.220.202 192.168.220.203
    do
      echo ====================  $host  ====================
      #3. 遍历所有目录,挨个发送
      for file in $@
      do
        #4 判断文件是否存在
        if [ -e $file ]
        then
          #5. 获取父目录
          pdir=$(cd -P $(dirname $file); pwd)
          #6. 获取当前文件的名称
          fname=$(basename $file)
          ssh $host "mkdir -p $pdir"
          rsync -av $pdir/$fname $host:$pdir
        else
          echo $file does not exists!
        fi
      done
    done
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
  3. 修改脚本xsync具有执行权限

    chmod +x xsync
    
    1
  4. 测试脚本

    xsync xsync
    
    1

# 环境变量配置说明

Linux的环境变量可在多个文件中配置,如/etc/profile,/etc/profile.d/*.sh,~/.bashrc,~/.bash_profile等,下面说明上述几个文件之间的关系和区别。

bash的运行模式可分为login shell和non-login shell。

例如,我们通过终端,输入用户名、密码,登录系统之后,得到就是一个login shell。而当我们执行以下命令ssh hadoop103 command,在hadoop103执行command的就是一个non-login shell。

这两种shell的主要区别在于,它们启动时会加载不同的配置文件,login shell启动时会加载/etc/profile,~/.bash_profile,~/.bashrc。non-login shell启动时会加载~/.bashrc。

而在加载~/.bashrc(实际是~/.bashrc中加载的/etc/bashrc)或/etc/profile时,都会执行如下代码片段

因此不管是login shell还是non-login shell,启动时都会加载/etc/profile.d/*.sh中的环境变量。

# JDK

  1. 卸载现有JDK

    sudo rpm -qa | grep -i java | xargs -n1 sudo rpm -e --nodeps
    
    1
  2. 上传并解压文件

  3. 添加环境变量

    sudo vim /etc/profile.d/my_env.sh
    
    #JAVA_HOME
    export JAVA_HOME=/opt/module/jdk1.8.0_212
    export PATH=$PATH:$JAVA_HOME/bin
    
    1
    2
    3
    4
    5
  4. 让环境变量生效

    source /etc/profile.d/my_env.sh
    
    1

# 集群日志生成脚本

ha01和ha02两台机器上运行日志生成脚本

  1. /home/atguigu/bin目录下创建脚本lg.sh

  2. 在脚本中编写如下内容

    #!/bin/bash
    for i in ha01 ha02; do
        echo "========== $i =========="
        ssh $i "cd /opt/module/applog/; java -jar gmall2020-mock-log-2021-01-22.jar >/dev/null 2>&1 &"
    done 
    
    1
    2
    3
    4
    5

    注意

    1. /opt/module/applog/为jar包及配置文件所在路径

    2. /dev/null代表Linux的空设备文件,所有往这个文件里面写入的内容都会丢失,俗称“黑洞”。

      标准输入0:从键盘获得输入 /proc/self/fd/0

      标准输出1:输出到屏幕(即控制台) /proc/self/fd/1

      错误输出2:输出到屏幕(即控制台) /proc/self/fd/2

  3. 修改脚本执行权限

  4. 将jar包及配置文件上传到ha02的/opt/module/applog/路径

  5. 启动脚本

  6. 查看日志数据

# Log数据采集模块

# 集群所有进程查看脚本

  1. 在/home/atguigu/bin目录下创建脚本xcall.sh

  2. 编辑脚本内容

    #! /bin/bash
     
    for i in ha01 ha02 ha03
    do
        echo --------- $i ----------
        ssh $i "$*"
    done
    
    1
    2
    3
    4
    5
    6
    7
  3. 修改脚本执行权限

  4. 启动脚本

    xcall.sh jps
    
    1

# Hadoop 安装

服务器ha01 服务器ha02 服务器ha03
HDFS NameNode DataNode DataNode DataNode SecondaryNameNode
Yarn NodeManager Resourcemanager NodeManager
NodeManager

注意:NameNode和SecondaryNameNode不要安装在同一台服务器

注意:ResourceManager也很消耗内存,不要和NameNode、SecondaryNameNode配置在同一台机器上。

# 完全分布式运行模式(开发重点)

​ 1)准备3台客户机(关闭防火墙、静态IP、主机名称)

​ 2)安装JDK

​ 3)配置环境变量

​ 4)安装Hadoop

​ 5)配置环境变量

​ 6)配置集群

​ 7)单点启动

​ 8)配置ssh

​ 9)群起并测试集群

# 步骤

  1. 上传hadoop压缩包并解压

  2. 将Hadoop添加到环境变量

    vim /etc/profile.d/my_env.sh
    
    #HADOOP_HOME
    export HADOOP_HOME=/opt/module/hadoop-3.1.3
    export PATH=$PATH:$HADOOP_HOME/bin
    export PATH=$PATH:$HADOOP_HOME/sbin
    
    1
    2
    3
    4
    5
    6
  3. 分发环境变量文件

    sudo /home/atguigu/bin/xsync /etc/profile.d/my_env.sh
    
    1
  4. source一下,使之生效(3台节点)

    source /etc/profile.d/my_env.sh
    
    1
  5. 配置集群

    1. 核心配置文件(core-site.xml)

      vim core-site.xml
      
      <?xml version="1.0" encoding="UTF-8"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      
      <configuration>
      <!-- 指定NameNode的地址 -->
      <property>
              <name>fs.defaultFS</name>
              <value>hdfs://ha01:8020</value>
      </property>
      <!-- 指定hadoop数据的存储目录 -->
          <property>
              <name>hadoop.tmp.dir</name>
              <value>/opt/module/hadoop-3.1.3/data</value>
      </property>
      
      <!-- 配置HDFS网页登录使用的静态用户为damoncai -->
          <property>
              <name>hadoop.http.staticuser.user</name>
              <value>damoncai</value>
      </property>
      
      <!-- 配置该damoncai(superUser)允许通过代理访问的主机节点 -->
          <property>
              <name>hadoop.proxyuser.damoncai.hosts</name>
              <value>*</value>
      </property>
      <!-- 配置该damoncai(superUser)允许通过代理用户所属组 -->
          <property>
              <name>hadoop.proxyuser.damoncai.groups</name>
              <value>*</value>
      </property>
      <!-- 配置该damoncai(superUser)允许通过代理的用户-->
          <property>
              <name>hadoop.proxyuser.damoncai.users</name>
              <value>*</value>
      </property>
      </configuration>
      
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
    2. HDFS配置文件(hdfs-site.xml)

      vim hdfs-site.xml
      
      <?xml version="1.0" encoding="UTF-8"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      
      <configuration>
      	<!-- nn web端访问地址-->
      	<property>
              <name>dfs.namenode.http-address</name>
      		<value>ha01:9870</value>
          </property>
          
      	<!-- 2nn web端访问地址-->
          <property>
              <name>dfs.namenode.secondary.http-address</name>
              <value>ha03:9868</value>
          </property>
          
          <!-- 测试环境指定HDFS副本的数量1 -->
          <property>
              <name>dfs.replication</name>
              <value>1</value>
          </property>
      </configuration>
      
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
    3. YARN配置文件(yarn-site.xml)

      vim yarn-site.xml
      
      <?xml version="1.0" encoding="UTF-8"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      
      <configuration>
      	<!-- 指定MR走shuffle -->
          <property>
              <name>yarn.nodemanager.aux-services</name>
              <value>mapreduce_shuffle</value>
          </property>
          
          <!-- 指定ResourceManager的地址-->
          <property>
              <name>yarn.resourcemanager.hostname</name>
              <value>ha02</value>
          </property>
          
          <!-- 环境变量的继承 -->
          <property>
              <name>yarn.nodemanager.env-whitelist</name>
              <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
          </property>
          
          <!-- yarn容器允许分配的最大最小内存 -->
          <property>
              <name>yarn.scheduler.minimum-allocation-mb</name>
              <value>512</value>
          </property>
          <property>
              <name>yarn.scheduler.maximum-allocation-mb</name>
              <value>4096</value>
          </property>
          <!-- yarn容器允许管理的物理内存大小 -->
          <property>
              <name>yarn.nodemanager.resource.memory-mb</name>
              <value>4096</value>
          </property>
          
          <!-- 关闭yarn对虚拟内存的限制检查 -->
          <property>
              <name>yarn.nodemanager.vmem-check-enabled</name>
              <value>false</value>
          </property>
      </configuration>
      
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
    4. MapReduce配置文件(mapred-site.xml)

      vim mapred-site.xml
      
      <?xml version="1.0" encoding="UTF-8"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      
      <configuration>
      	<!-- 指定MapReduce程序运行在Yarn上 -->
          <property>
              <name>mapreduce.framework.name</name>
              <value>yarn</value>
          </property>
      </configuration>
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
    5. 配置workers

      vim /opt/module/hadoop-3.1.3/etc/hadoop/workers
      
      ha01
      ha02
      ha03
      
      1
      2
      3
      4
      5

      注意:该文件中添加的内容结尾不允许有空格,文件中不允许有空行。

  6. 配置历史服务器

    1. mapred-site.xml

      <!-- 历史服务器端地址 -->
      <property>
          <name>mapreduce.jobhistory.address</name>
          <value>ha01:10020</value>
      </property>
      
      <!-- 历史服务器web端地址 -->
      <property>
        <name>mapreduce.jobhistory.webapp.address</name>
          <value>ha01:19888</value>
      </property>
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
  7. 配置日志的聚集

    日志聚集概念:应用运行完成以后,将程序运行日志信息上传到HDFS系统上。

    日志聚集功能好处:可以方便的查看到程序运行详情,方便开发调试。

    注意:开启日志聚集功能,需要重新启动NodeManager 、ResourceManager和HistoryManager。

    1. 配置yarn-site.xml

      <!-- 开启日志聚集功能 -->
      <property>
          <name>yarn.log-aggregation-enable</name>
          <value>true</value>
      </property>
      
      <!-- 设置日志聚集服务器地址 -->
      <property>  
          <name>yarn.log.server.url</name>  
          <value>http://ha01:19888/jobhistory/logs</value>
      </property>
      
      <!-- 设置日志保留时间为7天 -->
      <property>
          <name>yarn.log-aggregation.retain-seconds</name>
          <value>604800</value>
      </property>
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
  8. 分发Hadoop

    xsync /opt/module/hadoop-3.1.3/
    
    1
  9. 群起集群

    1. 如果集群是第一次启动,需要在ha01节点格式化NameNode(注意格式化之前,一定要先停止上次启动的所有namenode和datanode进程,然后再删除data和log数据)

      bin/hdfs namenode -format
      
      1
    2. 启动HDFS

      sbin/start-dfs.sh
      
      1
    3. 在配置了ResourceManager的节点(ha02)启动YARN

      sbin/start-yarn.sh
      
      1
    4. Web端查看HDFS的Web页面:http://ha01:9870/

  10. Hadoop群起脚本

    1. 来到/home/damoncai/bin目录

    2. 编辑脚本

      vim hdp.sh
      
      1
    3. 输入如下内容

    #!/bin/bash
    if [ $# -lt 1 ]
    then
        echo "No Args Input..."
        exit ;
    fi
    case $1 in
    "start")
            echo " =================== 启动 hadoop集群 ==================="
    
            echo " --------------- 启动 hdfs ---------------"
            ssh ha01 "/opt/module/hadoop-3.1.3/sbin/start-dfs.sh"
            echo " --------------- 启动 yarn ---------------"
            ssh ha02 "/opt/module/hadoop-3.1.3/sbin/start-yarn.sh"
            echo " --------------- 启动 historyserver ---------------"
            ssh ha01 "/opt/module/hadoop-3.1.3/bin/mapred --daemon start historyserver"
    ;;
    "stop")
            echo " =================== 关闭 hadoop集群 ==================="
    
            echo " --------------- 关闭 historyserver ---------------"
            ssh ha01 "/opt/module/hadoop-3.1.3/bin/mapred --daemon stop historyserver"
            echo " --------------- 关闭 yarn ---------------"
            ssh ha02 "/opt/module/hadoop-3.1.3/sbin/stop-yarn.sh"
            echo " --------------- 关闭 hdfs ---------------"
            ssh ha01 "/opt/module/hadoop-3.1.3/sbin/stop-dfs.sh"
    ;;
    *)
        echo "Input Args Error..."
    ;;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31

# 项目经验

# 项目经验之HDFS存储多目录

  1. 给Linux系统新增加一块硬盘
  2. 生产环境服务器磁盘情况
  3. 在hdfs-site.xml文件中配置多目录,注意新挂载磁盘的访问权限问题

​ HDFS的DataNode节点保存数据的路径由dfs.datanode.data.dir参数决定,其默认值为file://${hadoop.tmp.dir}/dfs/data,若服务器有多个磁盘,必须对该 参数进行修改。如服务器磁盘如上图所示,则该参数应修改为如下的值。

<property>
    <name>dfs.datanode.data.dir</name>
<value>file:///dfs/data1,file:///hd2/dfs/data2,file:///hd3/dfs/data3,file:///hd4/dfs/data4</value>
</property>
1
2
3
4

注意:因为每台服务器节点的磁盘情况不同,所以这个配置配完之后,不需要分发

# 集群数据均衡

# 节点数据均衡

  1. 开启数据均衡命令

    start-balancer.sh -threshold 10
    
    1

    对于参数10,代表的是集群中各个节点的磁盘空间利用率相差不超过10%,可根据实际情况进行调整。

  2. 停止数据均衡命令

    stop-balancer.sh
    
    1

    注意:于HDFS需要启动单独的Rebalance Server来执行Rebalance操作,所以尽量不要在NameNode上执行start-balancer.sh (opens new window),而是找一台比较空闲的机器。

# 磁盘间数据均衡

  1. 生成均衡计划(我们只有一块磁盘,不会生成计划

    hdfs diskbalancer -plan ha02
    
    1
  2. 执行均衡计划

    hdfs diskbalancer -execute ha02.plan.json
    
    1
  3. 查看当前均衡任务的执行情况

    hdfs diskbalancer -query ha02
    
    1
  4. 取消均衡任务

    hdfs diskbalancer -cancel ha02.plan.json
    
    1

# 项目经验之支持LZO压缩配置

  1. hadoop-lzo编译

    hadoop本身并不支持lzo压缩,故需要使用twitter提供的hadoop-lzo开源组件。hadoop-lzo需依赖hadoop和lzo进行编译,编译步骤如下。

    1. 环境准备(通过yum安装即可,yum -y install gcc-c++ lzo-devel zlib-devel autoconf automake libtool)

      1. maven(下载安装,配置环境变量,修改sitting.xml加阿里云镜像)
      2. gcc-c++
      3. zlib-devel
      4. autoconf
      5. automake
      6. libtool
    2. 下载、安装并编译LZO

      wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.10.tar.gz
      
      tar -zxvf lzo-2.10.tar.gz
      
      cd lzo-2.10
      
      ./configure -prefix=/usr/local/hadoop/lzo/
      
      make
      
      make install
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
    3. 编译hadoop-lzo源码

      1. 下载hadoop-lzo的源码,下载地址:https://github.com/twitter/hadoop-lzo/archive/master.zip

      2. 解压之后,修改pom.xml

      <hadoop.current.version>3.1.3</hadoop.current.version>
      
      1
      1. 声明两个临时环境变量

         export C_INCLUDE_PATH=/usr/local/hadoop/lzo/include
         export LIBRARY_PATH=/usr/local/hadoop/lzo/lib 
        
        1
        2
      2. 编译

        进入hadoop-lzo-master,执行maven编译命令
        
        mvn package -Dmaven.test.skip=true
        
        1
        2
        3
      3. 进入target,hadoop-lzo-0.4.21-SNAPSHOT.jar 即编译成功的hadoop-lzo组件

    4. 将编译好后的hadoop-lzo-0.4.20.jar 放入hadoop-3.1.3/share/hadoop/common/

    5. 同步hadoop-lzo-0.4.20.jar到ha02、ha03

      xsync hadoop-lzo-0.4.20.jar
      
      1
    6. core-site.xml增加配置支持LZO压缩

      <configuration>
          <property>
              <name>io.compression.codecs</name>
              <value>
                  org.apache.hadoop.io.compress.GzipCodec,
                  org.apache.hadoop.io.compress.DefaultCodec,
                  org.apache.hadoop.io.compress.BZip2Codec,
                  org.apache.hadoop.io.compress.SnappyCodec,
                  com.hadoop.compression.lzo.LzoCodec,
                  com.hadoop.compression.lzo.LzopCodec
              </value>
          </property>
      
          <property>
              <name>io.compression.codec.lzo.class</name>
              <value>com.hadoop.compression.lzo.LzoCodec</value>
          </property>
      </configuration>
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
    7. 同步core-site.xml到ha02、ha03

      xsync core-site.xml
      
      1
    8. 启动及查看集群

      sbin/start-dfs.sh
      sbin/start-yarn.sh
      
      1
      2
    9. 测试-数据准备

      hadoop fs -mkdir /input
      hadoop fs -put README.txt /input
      
      1
      2
    10. 测试-压缩

      hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount -Dmapreduce.output.fileoutputformat.compress=true -Dmapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzopCodec  /input /output
      
      1

# 项目经验之LZO创建索引

  1. 创建LZO文件的索引

    LZO压缩文件的可切片特性依赖于其索引,故我们需要手动为LZO压缩文件创建索引。若无索引,则LZO文件的切片只有一个。

    hadoop jar /path/to/your/hadoop-lzo.jar com.hadoop.compression.lzo.DistributedLzoIndexer big_file.lzo
    
    1
  2. 测试

    1. 将bigtable.lzo(200M)上传到集群的根目录

      hadoop fs -mkdir /input
      
      hadoop fs -put bigtable.lzo /input
      
      1
      2
      3
    2. 执行wordcount程序

      hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount -Dmapreduce.job.inputformat.class=com.hadoop.mapreduce.LzoTextInputFormat /input /output1
      
      1

    3. 对上传的LZO文件建索引

      hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar  com.hadoop.compression.lzo.DistributedLzoIndexer /input/bigtable.lzo
      
      1
    4. 再次执行WordCount程序

      hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount -Dmapreduce.job.inputformat.class=com.hadoop.mapreduce.LzoTextInputFormat /input /output2
      
      1

    5. 注意:如果以上任务,在运行过程中报如下异常

      解决办法:在ha01的/opt/module/hadoop-3.1.3/etc/hadoop/yarn-site.xml文件中增加如下配置,然后分发到ha02、ha03服务器上,并重新启动集群。

      <!--是否启动一个线程检查每个任务正使用的虚拟内存量,如果任务超出分配值,则直接将其杀掉,默认是true -->
      <property>
         <name>yarn.nodemanager.vmem-check-enabled</name>
         <value>false</value>
      </property>
      
      1
      2
      3
      4
      5

# 项目经验之Hadoop参数调优

# HDFS参数调优hdfs-site.xml

The number of Namenode RPC server threads that listen to requests from clients. If dfs.namenode.servicerpc-address is not configured then Namenode RPC server threads listen to requests from all nodes.
NameNode有一个工作线程池,用来处理不同DataNode的并发心跳以及客户端并发的元数据操作。
对于大集群或者有大量客户端的集群来说,通常需要增大参数dfs.namenode.handler.count的默认值10。
<property>
    <name>dfs.namenode.handler.count</name>
    <value>21/value>
</property>
1
2
3
4
5
6
7

可通过简单的python代码计算该值,代码如下

[atguigu@hadoop102 ~]$ python
Python 2.7.5 (default, Apr 11 2018, 07:36:10) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-28)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import math
>>> print int(20*math.log(8))
41
>>> quit()
1
2
3
4
5
6
7
8

# YARN参数调优yarn-site.xml

(1)情景描述:总共7台机器,每天几亿条数据,数据源->Flume->Kafka->HDFS->Hive

面临问题:数据统计主要用HiveSQL,没有数据倾斜,小文件已经做了合并处理,开启的JVM重用,而且IO没有阻塞,内存用了不到50%。但是还是跑的非常慢,而且数据量洪峰过来时,整个集群都会宕掉。基于这种情况有没有优化方案。

(2)解决办法:

NodeManager内存和服务器实际内存配置尽量接近,如服务器有128g内存,但是NodeManager默认内存8G,不修改该参数最多只能用8G内存。NodeManager使用的CPU核数和服务器CPU核数尽量接近。

①yarn.nodemanager.resource.memory-mb NodeManager使用内存数

②yarn.nodemanager.resource.cpu-vcores NodeManager使用CPU核数

# Zookeeper安装

# ZK安装

  1. 上传文件并解压

  2. 分发zk

    xsync zookeeper-3.5.7
    
    1
  3. 在/opt/module/zookeeper-3.5.7/这个目录下创建zkData

    mkdir zkData
    
    1
  4. 在/opt/module/zookeeper-3.5.7/zkData目录下创建一个myid的文件

    vi myid
    
    1

    在文件中添加与server对应的编号 1

  5. 拷贝配置好的zookeeper到其他机器上,修改myid文件中内容为2、3

  6. 配置zoo.cfg文件

    1. 重命名/opt/module/zookeeper-3.5.7/conf这个目录下的zoo_sample.cfg为zoo.cfg

      mv zoo_sample.cfg zoo.cfg
      
      1
    2. 打开zoo.cfg文件

      vim zoo.cfg
      
      修改数据存储路径配置
      dataDir=/opt/module/zookeeper-3.5.7/zkData
      
      增加如下配置
      #######################cluster##########################
      server.1=ha01:2888:3888
      server.2=ha02:2888:3888
      server.3=ha03:2888:3888
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
    3. 同步zoo.cfg配置文件

      xsync zoo.cfg
      
      1
    4. 配置参数解读

      server.A=B:C:D。
      
      A是一个数字,表示这个是第几号服务器;
      集群模式下配置一个文件myid,这个文件在dataDir目录下,这个文件里面有一个数据就是A的值,Zookeeper启动时读取此文件,拿到里面的数据与zoo.cfg里面的配置信息比较从而判断到底是哪个server。
      B是这个服务器的地址;
      C是这个服务器Follower与集群中的Leader服务器交换信息的端口;
      D是万一集群中的Leader服务器挂了,需要一个端口来重新进行选举,选出一个新的Leader,而这个端口就是用来执行选举时服务器相互通信的端口。
      
      1
      2
      3
      4
      5
      6
      7
    5. 集群操作

      bin/zkServer.sh start
      
      1

# ZK集群启动停止脚本

  1. 在hadoop102的/home/damoncai/bin目录下创建脚本

    vim zk.sh
    
    1
  2. 在脚本中编写如下内容

    #!/bin/bash
    
    case $1 in
    "start"){
    	for i in ha01 ha02 ha03
    	do
            echo ---------- zookeeper $i 启动 ------------
    		ssh $i "/opt/module/zookeeper-3.5.7/bin/zkServer.sh start"
    	done
    };;
    "stop"){
    	for i in ha01 ha02 ha03
    	do
            echo ---------- zookeeper $i 停止 ------------    
    		ssh $i "/opt/module/zookeeper-3.5.7/bin/zkServer.sh stop"
    	done
    };;
    "status"){
    	for i in ha01 ha02 ha03
    	do
            echo ---------- zookeeper $i 状态 ------------    
    		ssh $i "/opt/module/zookeeper-3.5.7/bin/zkServer.sh status"
    	done
    };;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
  3. 增加脚本执行权限

    chmod u+x zk.sh
    
    1
  4. Zookeeper集群启动脚本

    zk.sh start|stop|status
    
    1

# Kafka安装

# Kafka集群安装

  1. 上传文件并解压

  2. 修改解压后的文件名称为kafka

  3. 在/opt/module/kafka目录下创建logs文件夹

  4. 修改配置文件vi server.properties

    修改或者增加以下内容:
    #broker的全局唯一编号,不能重复
    broker.id=0
    #删除topic功能使能
    delete.topic.enable=true
    #kafka运行日志存放的路径
    log.dirs=/opt/module/kafka/data
    #配置连接Zookeeper集群地址
    zookeeper.connect=ha01:2181,ha02:2181,ha03:2181/kafka
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
  5. 配置环境变量

    sudo vi /etc/profile.d/my_env.sh
    
    #KAFKA_HOME
    export KAFKA_HOME=/opt/module/kafka
    export PATH=$PATH:$KAFKA_HOME/bin
    
    source /etc/profile.d/my_env.sh
    
    1
    2
    3
    4
    5
    6
    7
  6. 分发安装包

  7. 分别在hadoop103和hadoop104上修改配置文件/opt/module/kafka/config/server.properties中的broker.id=1、broker.id=2

  8. 启动集群 每台机器上执行

    bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties
    
    1
  9. 关闭集群

    bin/kafka-server-stop.sh
    
    1

# Kafka集群启动停止脚本

  1. 在/home/damoncai/bin目录下创建脚本kf.sh

  2. 在脚本中填写如下内容

    #! /bin/bash
    
    case $1 in
    "start"){
        for i in ha01 ha02 ha03
        do
            echo " --------启动 $i Kafka-------"
            ssh $i "/opt/module/kafka/bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties"
        done
    };;
    "stop"){
        for i in ha01 ha02 ha03
        do
            echo " --------停止 $i Kafka-------"
            ssh $i "/opt/module/kafka/bin/kafka-server-stop.sh stop"
        done
    };;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
  3. 增加脚本执行权限

    chmod u+x kf.sh
    
    1
  4. 启动

    kf.sh start | stop
    
    1

# Kafka常用命令

  1. 查看Kafka Topic列表

    bin/kafka-topics.sh --zookeeper hadoop102:2181/kafka --list
    
    1
  2. 创建Kafka Topic

    bin/kafka-topics.sh --zookeeper hadoop102:2181,hadoop103:2181,hadoop104:2181/kafka  --create --replication-factor 1 --partitions 1 --topic topic_log
    
    1
  3. 删除Kafka Topic

    bin/kafka-topics.sh --delete --zookeeper hadoop102:2181,hadoop103:2181,hadoop104:2181/kafka --topic topic_log
    
    1
  4. Kafka生产消息

    bin/kafka-console-producer.sh \
    --broker-list hadoop102:9092 --topic topic_log
    >hello world
    >atguigu  atguigu
    
    1
    2
    3
    4
  5. Kafka消费消息

    bin/kafka-console-consumer.sh \
    --bootstrap-server hadoop102:9092 --from-beginning --topic topic_log
    
    1
    2
  6. 查看Kafka Topic详情

bin/kafka-topics.sh --zookeeper hadoop102:2181/kafka \
--describe --topic topic_log
1
2

# 项目经验之Kafka机器数量计算

Kafka机器数量(经验公式)= 2 *(峰值生产速度 * 副本数 / 100)+ 1

先拿到峰值生产速度,再根据设定的副本数,就能预估出需要部署Kafka的数量。

  1. 峰值生产速度

    峰值生产速度可以压测得到

  2. 副本数

    副本数默认是1个,在企业里面2-3个都有,2个居多。

    副本多可以提高可靠性,但是会降低网络传输效率。

    比如我们的峰值生产速度是50M/s。副本数为2。

    Kafka机器数量 = 2 *(50 * 2 / 100)+ 1 = 3台

  3. 项目经验之Kafka压力测试

    1. Kafka压测

      用Kafka官方自带的脚本,对Kafka进行压测。

      kafka-consumer-perf-test.sh

      kafka-producer-perf-test.sh

      Kafka压测时,在硬盘读写速度一定的情况下,可以查看到哪些地方出现了瓶颈(CPU,内存,网络IO)。一般都是网络IO达到瓶颈。

    2. Kafka Producer压力测试

      1. 压测环境准备

        1. ha01、ha02、ha03的网络带宽都设置为100mbps。

        2. 关闭hadoop102主机,并根据ha01克隆出ha04(修改IP和主机名称)

        3. ha04的带宽不设限

        4. 创建一个test topic,设置为3个分区2个副本

          bin/kafka-topics.sh --zookeeper hadoop102:2181,hadoop103:2181,hadoop104:2181/kafka --create --replication-factor 2 --partitions 3 --topic test
          
          1
        5. 在/opt/module/kafka/bin目录下面有这两个文件。我们来测试一下

          bin/kafka-producer-perf-test.sh  --topic test --record-size 100 --num-records 10000000 --throughput -1 --producer-props bootstrap.servers=hadoop102:9092,hadoop103:9092,hadoop104:9092
          
          1

          说明:

          record-size是一条信息有多大,单位是字节。

          num-records是总共发送多少条信息。

          throughput 是每秒多少条信息,设成-1,表示不限流,尽可能快的生产数据,可测出生产者最大吞吐量。

          ha01、ha02、ha03三台集群的网络总带宽30m/s左右,由于是两个副本,所以Kafka的吞吐量30m/s ➗ 2(副本) = 15m/s

          结论:网络带宽和副本都会影响吞吐量

        6. 调整batch.size

          batch.size默认值是16k。

          batch.size较小,会降低吞吐量。比如说,批次大小为0则完全禁用批处理,会一条一条发送消息);

          batch.size过大,会增加消息发送延迟。比如说,Batch设置为64k,但是要等待5秒钟Batch才凑满了64k,才能发送出去。那这条消息的延迟就是5秒钟。

          bin/kafka-producer-perf-test.sh  --topic test --record-size 100 --num-records 10000000 --throughput -1 --producer-props bootstrap.servers=hadoop102:9092,hadoop103:9092,hadoop104:9092 batch.size=500
          
          1

          输出结果:

          69169 records sent, 13833.8 records/sec (1.32 MB/sec), 2517.6 ms avg latency, 4299.0 ms max latency.
          105372 records sent, 21074.4 records/sec (2.01 MB/sec), 6748.4 ms avg latency, 9016.0 ms max latency.
          113188 records sent, 22637.6 records/sec (2.16 MB/sec), 11348.0 ms avg latency, 13196.0 ms max latency.
          108896 records sent, 21779.2 records/sec (2.08 MB/sec), 12272.6 ms avg latency, 12870.0 ms max latency.
          
          1
          2
          3
          4
        7. linger.ms

          如果设置batch size为64k,但是比如过了10分钟也没有凑够64k,怎么办?

          可以设置,linger.ms。比如linger.ms=5ms,那么就是要发送的数据没有到64k,5ms后,数据也会发出去。

        8. 总结

          同时设置batch.size和 linger.ms,就是哪个条件先满足就都会将消息发送出去

          Kafka需要考虑高吞吐量与延时的平衡。

      2. Kafka Consumer压力测试

        1. Consumer的测试,如果这四个指标(IO,CPU,内存,网络)都不能改变,考虑增加分区数来提升性能。

          bin/kafka-consumer-perf-test.sh --broker-list hadoop102:9092,hadoop103:9092,hadoop104:9092 --topic test --fetch-size 10000 --messages 10000000 --threads 1
          
          1

          --broker-list指定Kafka集群地址

          --topic 指定topic的名称

          --fetch-size 指定每次fetch的数据的大小

          --messages 总共要消费的消息个数

          测试结果说明:

          start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg**, nMsg.sec**

          2021-08-03 21:17:21:778, 2021-08-03 21:18:19:775, 514.7169, 8.8749, 5397198, 93059.9514

          开始测试时间,测试结束数据,共消费数据514.7169MB,吞吐量8.8749MB/s

        2. 调整fetch-size

          增加fetch-size值,观察消费吞吐量

          bin/kafka-consumer-perf-test.sh --broker-list hadoop102:9092,hadoop103:9092,hadoop104:9092 --topic test --fetch-size 100000 --messages 10000000 --threads 1
          
          1

          测试结果说明:

          start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg**, nMsg.sec**

          2021-08-03 21:22:57:671, 2021-08-03 21:23:41:938, 514.7169, 11.6276, 5397198, 121923.7355

        3. 总结

          吞吐量受网络带宽和fetch-size的影响

    # 项目经验值Kafka分区数计算

    (1)创建一个只有1个分区的topic

    (2)测试这个topic的producer吞吐量和consumer吞吐量。

    (3)假设他们的值分别是Tp和Tc,单位可以是MB/s。

    (4)然后假设总的目标吞吐量是Tt,那么分区数 = Tt / min(Tp,Tc)

    例如:producer吞吐量 = 20m/s;consumer吞吐量 = 50m/s,期望吞吐量100m/s;

    分区数 = 100 / 20 = 5分区

    https://blog.csdn.net/weixin_42641909/article/details/89294698

    分区数一般设置为:3-10个

# 采集日志Flume

# Flume安装

集群规划:

服务器ha01 服务器ha02 服务器ha03
Flume(采集日志) Flume Flume

安装地址

  1. Flume官网地址:http://flume.apache.org/
  2. 文档查看地址:http://flume.apache.org/FlumeUserGuide.html
  3. 下载地址:http://archive.apache.org/dist/flume/

安装部署

  1. 上传文件并解压

  2. 修改文件夹名称为flume

  3. 将lib文件夹下的guava-11.0.2.jar删除以兼容Hadoop 3.1.3

    注意:删除guava-11.0.2.jar的服务器节点,一定要配置hadoop环境变量。否则会报如下异常。

  4. 将flume/conf下的flume-env.sh.template文件修改为flume-env.sh,并配置flume-env.sh文件

    export JAVA_HOME=/opt/module/jdk1.8.0_212
    
    1
  5. 分发flume

    xsync flume
    
    1

# 项目经验之Flume组件选型

  1. Source

    1. Taildir Source相比Exec Source、Spooling Directory Source的优势

      TailDir Source:断点续传、多目录。Flume1.6以前需要自己自定义Source记录每次读取文件位置,实现断点续传。不会丢数据,但是有可能会导致数据重复。

    2. Exec Source可以实时搜集数据,但是在Flume不运行或者Shell命令出错的情况下,数据将会丢失。

    3. Spooling Directory Source监控目录,支持断点续传。

    4. batchSize大小如何设置?

      Event 1K左右时,500-1000合适(默认为100)

  2. Channel

​ 采用Kafka Channel,省去了Sink,提高了效率。KafkaChannel数据存储在Kafka里面,所以数据是存储在磁盘中。

​ 注意在Flume1.7以前,Kafka Channel很少有人使用,因为发现parseAsFlumeEvent这个配置起不了作用。也就是无论parseAsFlumeEvent配置为true还是 false,都会转为Flume Event。这样的话,造成的结果是,会始终都把Flume的headers中的信息混合着内容一起写入Kafka的消息中,这显然不是我所需要 的,我只是需要把内容写入即可。

# 日志采集Flume配置

  1. Flume配置分析

    Flume直接读log日志的数据,log日志的格式是app.yyyy-mm-dd.log。

  2. Flume配置如下

    1. 在/opt/module/flume/conf目录下创建file-flume-kafka.conf文件

    2. 配置内容

      #为各组件命名
      a1.sources = r1
      a1.channels = c1
      
      #描述source
      a1.sources.r1.type = TAILDIR
      a1.sources.r1.filegroups = f1
      a1.sources.r1.filegroups.f1 = /opt/module/applog/log/app.*
      a1.sources.r1.positionFile = /opt/module/flume/taildir_position.json
      a1.sources.r1.interceptors =  i1
      a1.sources.r1.interceptors.i1.type = top.damoncai.flume.interceptor.ETLInterceptor$Builder
      
      #描述channel
      a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
      a1.channels.c1.kafka.bootstrap.servers = ha01:9092,ha02:9092
      a1.channels.c1.kafka.topic = topic_log
      a1.channels.c1.parseAsFlumeEvent = false
      
      #绑定source和channel以及sink和channel的关系
      a1.sources.r1.channels = c1
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20

    # Flume拦截器

    1. 创建工程

    2. 导入依赖

      <dependencies>
          <dependency>
              <groupId>org.apache.flume</groupId>
              <artifactId>flume-ng-core</artifactId>
              <version>1.9.0</version>
              <scope>provided</scope>
          </dependency>
      
          <dependency>
              <groupId>com.alibaba</groupId>
              <artifactId>fastjson</artifactId>
              <version>1.2.62</version>
          </dependency>
      </dependencies>
      
      <build>
          <plugins>
              <plugin>
                  <artifactId>maven-compiler-plugin</artifactId>
                  <version>2.3.2</version>
                  <configuration>
                      <source>1.8</source>
                      <target>1.8</target>
                  </configuration>
              </plugin>
              <plugin>
                  <artifactId>maven-assembly-plugin</artifactId>
                  <configuration>
                      <descriptorRefs>
                          <descriptorRef>jar-with-dependencies</descriptorRef>
                      </descriptorRefs>
                  </configuration>
                  <executions>
                      <execution>
                          <id>make-assembly</id>
                          <phase>package</phase>
                          <goals>
                              <goal>single</goal>
                          </goals>
                      </execution>
                  </executions>
              </plugin>
          </plugins>
      </build>
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44

      注意:scope中provided的含义是编译时用该jar包。打包时时不用。因为集群上已经存在flume的jar包。只是本地编译时用一下。

    3. 在com.atguigu.flume.interceptor包下创建JSONUtils类

      package com.atguigu.flume.interceptor;
      
      import com.alibaba.fastjson.JSON;
      import com.alibaba.fastjson.JSONException;
      
      public class JSONUtils {
          public static boolean isJSONValidate(String log){
              try {
                  JSON.parse(log);
                  return true;
              }catch (JSONException e){
                  return false;
              }
          }
      }
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
    4. 在com.atguigu.flume.interceptor包下创建LogInterceptor类

      package com.atguigu.flume.interceptor;
      
      import com.alibaba.fastjson.JSON;
      import org.apache.flume.Context;
      import org.apache.flume.Event;
      import org.apache.flume.interceptor.Interceptor;
      
      import java.nio.charset.StandardCharsets;
      import java.util.Iterator;
      import java.util.List;
      
      public class ETLInterceptor implements Interceptor {
      
          @Override
          public void initialize() {
      
          }
      
          @Override
          public Event intercept(Event event) {
      
              byte[] body = event.getBody();
              String log = new String(body, StandardCharsets.UTF_8);
      
              if (JSONUtils.isJSONValidate(log)) {
                  return event;
              } else {
                  return null;
              }
          }
      
          @Override
          public List<Event> intercept(List<Event> list) {
      
              Iterator<Event> iterator = list.iterator();
      
              while (iterator.hasNext()){
                  Event next = iterator.next();
                  if(intercept(next)==null){
              iterator.remove();
                  }
              }
      
              return list;
          }
      
          public static class Builder implements Interceptor.Builder{
      
              @Override
              public Interceptor build() {
                  return new ETLInterceptor();
              }
              @Override
              public void configure(Context context) {
      
              }
      
          }
      
          @Override
          public void close() {
      
          }
      }           
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
    5. 打包

    6. 需要先将打好的包放入到ha01的/opt/module/flume/lib文件夹下面。

    7. 分发Flume到ha02、ha03

      xsync flume/
      
      1
    8. 分别在ha02、ha03上启动Flume

      bin/flume-ng agent --name a1 --conf-file conf/file-flume-kafka.conf &
      
      1

# 测试Flume-Kafka通道

  1. 生成日志

    lg.sh
    
    1
  2. 消费Kafka数据,观察控制台是否有数据获取到

    bin/kafka-console-consumer.sh \
    --bootstrap-server ha01:9092 --from-beginning --topic topic_log
    
    1
    2

# 日志采集Flume启动停止脚本

  1. 在/home/atguigu/bin目录下创建脚本f1.sh

  2. 填写如下内容

    #! /bin/bash
    
    case $1 in
    "start"){
            for i in ha01 ha02
            do
                    echo " --------启动 $i 采集flume-------"
                    ssh $i "nohup /opt/module/flume/bin/flume-ng agent --conf-file /opt/module/flume/conf/file-flume-kafka.conf --name a1 -Dflume.root.logger=INFO,LOGFILE >/opt/module/flume/log1.txt 2>&1  &"
            done
    };;	
    "stop"){
            for i in hadoop102 hadoop103
            do
            		echo " --------停止 $i 采集flume-------"
                    ssh $i "ps -ef | grep file-flume-kafka | grep -v grep |awk  '{print \$2}' | xargs -n1 kill -9 "
            done
    
    };;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19

    说明1:nohup,该命令可以在你退出帐户/关闭终端之后继续运行相应的进程。nohup就是不挂起的意思,不挂断地运行命令。

    说明2:awk 默认分隔符为空格

    说明3:$2是在“”双引号内部会被解析为脚本的第二个参数,但是这里面想表达的含义是awk的第二个值,所以需要将他转义,用$2表示。

    说明4:xargs 表示取出前面命令运行的结果,作为后面命令的输入参数。

  3. 增加脚本执行权限

    chmod u+x fl.sh
    
    1
  4. f1集群启动 | 停止脚本

    f1.sh start | stop
    
    1

# 消费Kafka数据Flume

集群规划

服务器hadoop102 服务器hadoop103 服务器hadoop104
Flume(消费Kafka) Flume

# 项目经验之Flume组件选型

  1. FileChannel和MemoryChannel区别

    MemoryChannel传输数据速度更快,但因为数据保存在JVM的堆内存中,Agent进程挂掉会导致数据丢失,适用于对数据质量要求不高的需求。

    FileChannel传输速度相对于Memory慢,但数据安全保障高,Agent进程挂掉也可以从失败中恢复数据。

    选型:

    金融类公司、对钱要求非常准确的公司通常会选择FileChannel

    传输的是普通日志信息(京东内部一天丢100万-200万条,这是非常正常的),通常选择MemoryChannel。

  2. FileChannel优化

    通过配置dataDirs指向多个路径,每个路径对应不同的硬盘,增大Flume吞吐量。

    官方说明如下:

    Comma separated list of directories for storing log files. Using multiple directories on separate disks can improve file channel peformance
    
    1

    checkpointDir和backupCheckpointDir也尽量配置在不同硬盘对应的目录中,保证checkpoint坏掉后,可以快速使用backupCheckpointDir恢复数据。

  3. Sink:HDFS Sink

    1. HDFS存入大量小文件,有什么影响?

      **元数据层面:**每个小文件都有一份元数据,其中包括文件路径,文件名,所有者,所属组,权限,创建时间等,这些信息都保存在Namenode内存中。所以小文件过多,会占用Namenode服务器大量内存,影响Namenode性能和使用寿命

    2. **计算层面:**默认情况下MR会对每个小文件启用一个Map任务计算,非常影响计算性能。同时也影响磁盘寻址时间。

    3. HDFS小文件处理

      官方默认的这三个参数配置写入HDFS后会产生小文件,hdfs.rollInterval、hdfs.rollSize、hdfs.rollCount

      基于以上hdfs.rollInterval=3600,hdfs.rollSize=134217728,hdfs.rollCount =0几个参数综合作用,效果如下:

      1. 文件在达到128M时会滚动生成新文件
      2. 文件创建超3600秒时会滚动生成新文件

# 消费者Flume配置

  1. Flume配置分析

  2. Flume的具体配置如下:

    1. 在ha03的/opt/module/flume/conf目录下创建kafka-flume-hdfs.conf文件

      ## 组件
      a1.sources=r1
      a1.channels=c1
      a1.sinks=k1
      
      ## source1
      a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
      a1.sources.r1.batchSize = 5000
      a1.sources.r1.batchDurationMillis = 2000
      a1.sources.r1.kafka.bootstrap.servers = ha01:9092,ha02:9092,ha03:9092
      a1.sources.r1.kafka.topics=topic_log
      a1.sources.r1.interceptors = i1
      a1.sources.r1.interceptors.i1.type = top.damoncai.flume.interceptor.TimeStampInterceptor$Builder
      
      ## channel1
      a1.channels.c1.type = file
      a1.channels.c1.checkpointDir = /opt/module/flume/checkpoint/behavior1
      a1.channels.c1.dataDirs = /opt/module/flume/data/behavior1/
      
      
      ## sink1
      a1.sinks.k1.type = hdfs
      a1.sinks.k1.hdfs.path = /origin_data/gmall/log/topic_log/%Y-%m-%d
      a1.sinks.k1.hdfs.filePrefix = log-
      a1.sinks.k1.hdfs.round = false
      
      #控制生成的小文件
      a1.sinks.k1.hdfs.rollInterval = 10
      a1.sinks.k1.hdfs.rollSize = 134217728
      a1.sinks.k1.hdfs.rollCount = 0
      
      ## 控制输出文件是原生文件。
      a1.sinks.k1.hdfs.fileType = CompressedStream
      a1.sinks.k1.hdfs.codeC = lzop
      
      ## 拼装
      a1.sources.r1.channels = c1
      a1.sinks.k1.channel= c1
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38

# Flume时间戳拦截器

由于Flume默认会用Linux系统时间,作为输出到HDFS路径的时间。如果数据是23:59分产生的。Flume消费Kafka里面的数据时,有可能已经是第二天了,那么这部门数据会被发往第二天的HDFS路径。我们希望的是根据日志里面的实际时间,发往HDFS的路径,所以下面拦截器作用是获取日志中的实际时间。

解决的思路:拦截json日志,通过fastjson框架解析json,获取实际时间ts。将获取的ts时间写入拦截器header头,header的key必须是timestamp,因为Flume框架会根据这个key的值识别为时间,写入到HDFS。

  1. 在top.damoncai.flume.interceptor包下创建TimeStampInterceptor类

    public class TimeStampInterceptor implements Interceptor {
    
        private ArrayList<Event> events = new ArrayList<>();
    
        @Override
        public void initialize() {
    
        }
    
        @Override
        public Event intercept(Event event) {
    
            Map<String, String> headers = event.getHeaders();
            String log = new String(event.getBody(), StandardCharsets.UTF_8);
    
            JSONObject jsonObject = JSONObject.parseObject(log);
    
            String ts = jsonObject.getString("ts");
            headers.put("timestamp", ts);
    
            return event;
        }
    
        @Override
        public List<Event> intercept(List<Event> list) {
            events.clear();
            for (Event event : list) {
                events.add(intercept(event));
            }
    
            return events;
        }
    
        @Override
        public void close() {
    
        }
    
        public static class Builder implements Interceptor.Builder {
            @Override
            public Interceptor build() {
                return new TimeStampInterceptor();
            }
    
            @Override
            public void configure(Context context) {
                        }
        }
    }
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
  2. 重新打包

  3. 需要先将打好的包放入到ha01的/opt/module/flume/lib文件夹下面。(如果包已存在,将之前的删除)

  4. 分发到ha02、ha03

# 消费者Flume启动停止脚本

  1. 在ha01服务器 /home/damoncai/bin目录下创建脚本f2.sh

    #! /bin/bash
    
    case $1 in
    "start"){
            for i in ha03
            do
                    echo " --------启动 $i 消费flume-------"
                    ssh $i "nohup /opt/module/flume/bin/flume-ng agent --conf-file /opt/module/flume/conf/kafka-flume-hdfs.conf --name a1 -Dflume.root.logger=INFO,LOGFILE >/opt/module/flume/log2.txt   2>&1 &"
            done
    };;
    "stop"){
            for i in ha03
            do
                    echo " --------停止 $i 消费flume-------"
                    ssh $i "ps -ef | grep kafka-flume-hdfs | grep -v grep |awk '{print \$2}' | xargs -n1 kill"
            done
    
    };;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
  2. 增加脚本执行权限

  3. f2脚本启动 | 停止 消费者Flume

# 项目经验之Flume内存优化

  1. 问题描述:如果启动消费者flume抛出如下异常

    ERROR hdfs.HDFSEventSink: process failed
    java.lang.OutOfMemoryError: GC overhead limit exceeded
    
    1
    2
  2. 解决方案

    1. 在ha01服务器的/opt/module/flume/conf/flume-env.sh文件中增加如下配置

      export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"
      
      1
    2. 同步配置到ha02、ha03服务器

  3. Flume内存参数设置及优化

    JVM heap一般设置为4G或更高

    -Xmx与-Xms最好设置一致,减少内存抖动带来的性能影响,如果设置不一致容易导致频繁fullgc。

    -Xms表示JVM Heap(堆内存)最小尺寸,初始分配;-Xmx 表示JVM Heap(堆内存)最大允许的尺寸,按需分配。如果不设置一致,容易在初始化时,由于内存不够,频繁触发fullgc。

# 采集通道启动/停止脚本

  1. 在/home/damoncai/bin目录下创建脚本cluster.sh

    #!/bin/bash
    
    case $1 in
    "start"){
            echo ================== 启动 集群 ==================
    
            #启动 Zookeeper集群
            zk.sh start
    
            #启动 Hadoop集群
            hdp.sh start
    
            #启动 Kafka采集集群
            kf.sh start
    
            #启动 Flume采集集群
            f1.sh start
    
            #启动 Flume消费集群
            f2.sh start
                };;
    "stop"){
            echo ================== 停止 集群 ==================
    
            #停止 Flume消费集群
            f2.sh stop
    
            #停止 Flume采集集群
            f1.sh stop
    
            #停止 Kafka采集集群
            kf.sh stop
    
            #停止 Hadoop集群
            hdp.sh stop
    
            #停止 Zookeeper集群
            zk.sh stop
    
    };;
    esac    
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
  2. 添加执行权限

  3. 启动|停止脚本

# 常见问题及解决方案

# 页面不能显示完整信息

  1. 问题描述

    访问2NN页面http://ha03:9868 (opens new window),看不到详细信息

  2. 解决方法

    1. 在浏览器上按F12,查看问题原因。定位bug在61行

    2. 找到要修改的文件

      /opt/module/hadoop-3.1.3/share/hadoop/hdfs/webapps/static
      
      vim dfs-dust.js
      
      :set nu
      修改61行
      return new Date(Number(v)).toLocaleString();
      
      1
      2
      3
      4
      5
      6
      7
    3. 分发dfs-dust.js

      xsync dfs-dust.js
      
      1
    4. 在http://ha03:9868/status.html 页面强制刷新

# 业务数据采集模块

# MySQL安装

安装机器 - ha01

# 安装包准备

  1. 将安装包和JDBC驱动上传到/opt/software,共计6个

    01_mysql-community-common-5.7.16-1.el7.x86_64.rpm
    02_mysql-community-libs-5.7.16-1.el7.x86_64.rpm
    03_mysql-community-libs-compat-5.7.16-1.el7.x86_64.rpm
    04_mysql-community-client-5.7.16-1.el7.x86_64.rpm
    05_mysql-community-server-5.7.16-1.el7.x86_64.rpm
    mysql-connector-java-5.1.27-bin.jar
    
    1
    2
    3
    4
    5
    6
  2. 如果是虚拟机按照如下步骤执行

    1. 卸载自带的Mysql-libs(如果之前安装过MySQL,要全都卸载掉)

      rpm -qa | grep -i -E mysql\|mariadb | xargs -n1 sudo rpm -e --nodeps
      
      1
  3. 如果是阿里云服务器按照如下步骤执行

    说明:由于阿里云服务器安装的是Linux最小系统版,没有如下工具,所以需要安装

    1. 卸载MySQL依赖,虽然机器上没有装MySQL,但是这一步不可少

      sudo yum remove mysql-libs
      
      1
    2. 下载依赖并安装

      sudo yum install libaio
      sudo yum -y install autoconf
      
      1
      2

# 安装MySQL

  1. 安装MySQL依赖

    sudo rpm -ivh 01_mysql-community-common-5.7.16-1.el7.x86_64.rpm
    sudo rpm -ivh 02_mysql-community-libs-5.7.16-1.el7.x86_64.rpm
    sudo rpm -ivh 03_mysql-community-libs-compat-5.7.16-1.el7.x86_64.rpm
    
    1
    2
    3
  2. 安装mysql-client

    sudo rpm -ivh 04_mysql-community-client-5.7.16-1.el7.x86_64.rpm
    
    1
  3. 安装mysql-server

    sudo rpm -ivh 05_mysql-community-server-5.7.16-1.el7.x86_64.rpm
    
    1

    注意:如果报如下错误,这是由于yum安装了旧版本的GPG keys所造成,从rpm版本4.1后,在安装或升级软件包时会自动检查软件包的签名。

    warning: 05_mysql-community-server-5.7.16-1.el7.x86_64.rpm: Header V3 DSA/SHA1 Signature, key ID 5072e1f5: NOKEY
    error: Failed dependencies:
    libaio.so.1()(64bit) is needed by mysql-community-server-5.7.16-1.el7.x86_64
    
    1
    2
    3

    解决办法

    sudo rpm -ivh 05_mysql-community-server-5.7.16-1.el7.x86_64.rpm --force --nodeps
    
    1
  4. 启动MySQL

    sudo systemctl start mysqld
    
    1
  5. 查看MySQL密码

    sudo cat /var/log/mysqld.log | grep password
    
    1

# 配置MySQL

配置只要是root用户 + 密码,在任何主机上都能登录MySQL数据库。

  1. 用刚刚查到的密码进入MySQL(如果报错,给密码加单引号)

    mysql -uroot -p'password'
    
    1
  2. 设置复杂密码(由于MySQL密码策略,此密码必须足够复杂

    set password=password("Qs23=zs32");
    
    1
  3. 更改MySQL密码策略

    set global validate_password_length=4;
    set global validate_password_policy=0;
    
    1
    2
  4. 设置简单好记的密码

    set password=password("000000");
    
    1
  5. 进入数据库

    use mysql
    
    1
  6. 查询user表

    select user, host from user;
    
    1
  7. 修改user表,把Host表内容修改为%

    update user set host="%" where user="root";
    
    1
  8. 刷新

    flush privileges;
    
    1
  9. 推出

    quit;
    
    1

# 生成业务数据

# 链接MySQL

# 导入SQL脚本

# 生成业务数据

  1. 在ha01的/opt/module/目录下创建db_log文件夹

  2. 把gmall2020-mock-db-2021-01-22.jar和application.properties上传到ha01的/opt/module/db_log路径上。

  3. 根据需求修改application.properties相关配置

    logging.level.root=info
    
    
    spring.datasource.driver-class-name=com.mysql.jdbc.Driver
    spring.datasource.url=jdbc:mysql://ha01:3306/gmall?characterEncoding=utf-8&useSSL=false&serverTimezone=GMT%2B8
    spring.datasource.username=root
    spring.datasource.password=000000
    
    logging.pattern.console=%m%n
    
    
    mybatis-plus.global-config.db-config.field-strategy=not_null
    
    
    #业务日期
    mock.date=2020-06-14
    #是否重置  注意:第一次执行必须设置为1,后续不需要重置不用设置为1
    mock.clear=1
    #是否重置用户 注意:第一次执行必须设置为1,后续不需要重置不用设置为1
    mock.clear.user=1
    
    #生成新用户数量
    mock.user.count=100
    #男性比例
    mock.user.male-rate=20
    #用户数据变化概率
    mock.user.update-rate:20
    
    #收藏取消比例
    mock.favor.cancel-rate=10
    #收藏数量
    mock.favor.count=100
    
    #每个用户添加购物车的概率
    mock.cart.user-rate=50
    #每次每个用户最多添加多少种商品进购物车
    mock.cart.max-sku-count=8 
    #每个商品最多买几个
    mock.cart.max-sku-num=3 
    
    #购物车来源  用户查询,商品推广,智能推荐, 促销活动
    mock.cart.source-type-rate=60:20:10:10
    
    #用户下单比例
    mock.order.user-rate=50
    #用户从购物中购买商品比例
    mock.order.sku-rate=50
    #是否参加活动
    mock.order.join-activity=1
    #是否使用购物券
    mock.order.use-coupon=1
    #购物券领取人数
    mock.coupon.user-count=100
    
    #支付比例
    mock.payment.rate=70
    #支付方式 支付宝:微信 :银联
    mock.payment.payment-type=30:60:10
    
    
    #评价比例 好:中:差:自动
    mock.comment.appraise-rate=30:10:10:50
    
    #退款原因比例:质量问题 商品描述与实际描述不一致 缺货 号码不合适 拍错 不想买了 其他
    mock.refund.reason-rate=30:10:20:5:15:5:5
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
  4. 并在该目录下执行,如下命令,生成2020-06-14日期数据:

    java -jar gmall2020-mock-db-2021-01-22.jar
    
    1
  5. 查看gmall数据库,观察是否有2020-06-14的数据出现

# Sqoop 安装

官网:http://sqoop.apache.org (opens new window)

**下载地址:**http://mirrors.hust.edu.cn/apache/sqoop/1.4.6/

  1. 上传安装包sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz到ha01的/opt/software路径中

  2. 解压sqoop安装包

  3. 进入到/opt/module/sqoop/conf目录,重命名配置文件

    mv sqoop-env-template.sh sqoop-env.sh
    
    1
  4. 修改配置文件 vim sqoop-env.sh

    export HADOOP_COMMON_HOME=/opt/module/hadoop-3.1.3
    export HADOOP_MAPRED_HOME=/opt/module/hadoop-3.1.3
    export HIVE_HOME=/opt/module/hive
    export ZOOKEEPER_HOME=/opt/module/zookeeper-3.5.7
    export ZOOCFGDIR=/opt/module/zookeeper-3.5.7/conf
    
    1
    2
    3
    4
    5
  5. 拷贝JDBC驱动

    1. 将mysql-connector-java-5.1.48.jar 上传到/opt/software路径

      cp mysql-connector-java-5.1.48.jar /opt/module/sqoop/lib/
      
      1
    2. 验证Sqoop

      bin/sqoop help
      
      1
  6. 测试Sqoop是否能够成功连接数据库

     bin/sqoop list-databases --connect jdbc:mysql://ha01:3306/ --username root --password 000000
    
    1
  7. Sqoop基本使用

    bin/sqoop import \
    --connect jdbc:mysql://hadoop102:3306/gmall \
    --username root \
    --password 000000 \
    --table user_info \
    --columns id,login_name \
    --where "id>=10 and id<=30" \
    --target-dir /test \
    --delete-target-dir \
    --fields-terminated-by '\t' \
    --num-mappers 2 \
    --split-by id
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12

# 同步策略

数据同步策略的类型包括:全量同步、增量同步、新增及变化同步、特殊情况

  1. 全量表:存储完整的数据。
  2. 增量表:存储新增加的数据。
  3. 新增及变化表:存储新增加的数据和变化的数据。
  4. 特殊表:只需要存储一次。

# 全量同步策略

# 增量同步策略

# 新增及变化策略

# 特殊策略

某些特殊的表,可不必遵循上述同步策略。例如某些不会发生变化的表(地区表,省份表,民族表)可以只存一份固定值。

# 业务数据导入HDFS

# 分析表同步策略

在生产环境,个别小公司,为了简单处理,所有表全量导入。

中大型公司,由于数据量比较大,还是严格按照同步策略导入数据。

# 业务数据首日同步脚本

# 脚本编写

  1. 在/home/damoncai/bin目录下创建

    vim mysql_to_hdfs_init.sh
    
    1

    添加如下内容:

    #! /bin/bash
    
    APP=gmall
    sqoop=/opt/module/sqoop/bin/sqoop
    
    if [ -n "$2" ] ;then
       do_date=$2
    else 
       echo "请传入日期参数"
       exit
    fi 
    
    import_data(){
    $sqoop import \
    --connect jdbc:mysql://ha01:3306/$APP \
    --username root \
    --password 000000 \
    --target-dir /origin_data/$APP/db/$1/$do_date \
    --delete-target-dir \
    --query "$2 where \$CONDITIONS" \
    --num-mappers 1 \
    --fields-terminated-by '\t' \
    --compress \
    --compression-codec lzop \
    --null-string '\\N' \
    --null-non-string '\\N'
    
    hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /origin_data/$APP/db/$1/$do_date
    }
    
    import_order_info(){
      import_data order_info "select
                                id, 
                                total_amount, 
                                order_status, 
                                user_id, 
                                payment_way,
                                delivery_address,
                                out_trade_no, 
                                create_time, 
                                operate_time,
                                expire_time,
                                tracking_no,
                                province_id,
                                activity_reduce_amount,
                                coupon_reduce_amount,                            
                                original_total_amount,
                                feight_fee,
                                feight_fee_reduce      
                            from order_info"
    }
    
    import_coupon_use(){
      import_data coupon_use "select
                              id,
                              coupon_id,
                              user_id,
                              order_id,
                              coupon_status,
                              get_time,
                              using_time,
                              used_time,
                              expire_time
                            from coupon_use"
    }
    
    import_order_status_log(){
      import_data order_status_log "select
                                      id,
                                      order_id,
                                      order_status,
                                      operate_time
                                    from order_status_log"
    }
    
    import_user_info(){
      import_data "user_info" "select 
                                id,
                                login_name,
                                nick_name,
                                name,
                                phone_num,
                                email,
                                user_level, 
                                birthday,
                                gender,
                                create_time,
                                operate_time
                              from user_info"
    }
    
    import_order_detail(){
      import_data order_detail "select 
                                  id,
                                  order_id, 
                                  sku_id,
                                  sku_name,
                                  order_price,
                                  sku_num, 
                                  create_time,
                                  source_type,
                                  source_id,
                                  split_total_amount,
                                  split_activity_amount,
                                  split_coupon_amount
                                from order_detail"
    }
    
    import_payment_info(){
      import_data "payment_info"  "select 
                                    id,  
                                    out_trade_no, 
                                    order_id, 
                                    user_id, 
                                    payment_type, 
                                    trade_no, 
                                    total_amount,  
                                    subject, 
                                    payment_status,
                                    create_time,
                                    callback_time 
                                  from payment_info"
    }
    
    import_comment_info(){
      import_data comment_info "select
                                  id,
                                  user_id,
                                  sku_id,
                                  spu_id,
                                  order_id,
                                  appraise,
                                  create_time
                                from comment_info"
    }
    
    import_order_refund_info(){
      import_data order_refund_info "select
                                    id,
                                    user_id,
                                    order_id,
     sku_id,
                                    refund_type,
                                    refund_num,
                                    refund_amount,
                                    refund_reason_type,
                                    refund_status,
                                    create_time
                                  from order_refund_info"
    }
    
    import_sku_info(){
      import_data sku_info "select 
                              id,
                              spu_id,
                              price,
                              sku_name,
                              sku_desc,
                              weight,
                              tm_id,
                              category3_id,
                              is_sale,
                              create_time
                            from sku_info"
    }
    
    import_base_category1(){
      import_data "base_category1" "select 
                                      id,
                                      name 
                                    from base_category1"
    }
    
    import_base_category2(){
      import_data "base_category2" "select
                                      id,
                                      name,
                                      category1_id 
                                    from base_category2"
    }
    
    import_base_category3(){
      import_data "base_category3" "select
                                      id,
                                      name,
                                      category2_id
                                    from base_category3"
    }
    
    import_base_province(){
      import_data base_province "select
                                  id,
                                  name,
                                  region_id,
                                  area_code,
                                  iso_code,
                                  iso_3166_2
                                from base_province"
    }
    
    
    import_base_region(){
      import_data base_region "select
                                  id,
                                  region_name
                                from base_region"
    }
    
    import_base_trademark(){
      import_data base_trademark "select
                                    id,
                                    tm_name
                                  from base_trademark"
    }
    
    import_spu_info(){
      import_data spu_info "select
                                id,
                                spu_name,
                                category3_id,
                                tm_id
                              from spu_info"
    }
    
    import_favor_info(){
      import_data favor_info "select
                              id,
                              user_id,
                              sku_id,
                              spu_id,
                              is_cancel,
                              create_time,
                              cancel_time
                            from favor_info"
    }
    
    import_cart_info(){
      import_data cart_info "select
                            id,
                            user_id,
                            sku_id,
                            cart_price,
                            sku_num,
                            sku_name,
                            create_time,
                            operate_time,
                            is_ordered,
                            order_time,
                            source_type,
                            source_id
                          from cart_info"
    }
    
    import_coupon_info(){
      import_data coupon_info "select
                              id,
                              coupon_name,
                              coupon_type,
                               condition_amount,
                              condition_num,
                              activity_id,
                              benefit_amount,
                              benefit_discount,
                              create_time,
                              range_type,
                              limit_num,
                              taken_count,
                              start_time,
                              end_time,
                              operate_time,
                              expire_time
                            from coupon_info"
    }
    
    import_activity_info(){
      import_data activity_info "select
                                  id,
                                  activity_name,
                                  activity_type,
                                  start_time,
                                  end_time,
                                  create_time
                                from activity_info"
    }
    
    import_activity_rule(){
        import_data activity_rule "select
                                        id,
                                        activity_id,
                                        activity_type,
                                        condition_amount,
                                        condition_num,
                                        benefit_amount,
                                        benefit_discount,
                                        benefit_level
                                    from activity_rule"
    }
    
    import_base_dic(){
        import_data base_dic "select
                                dic_code,
                                dic_name,
                                parent_code,
                                create_time,
                                operate_time
                              from base_dic"
    }
    
    
    import_order_detail_activity(){
        import_data order_detail_activity "select
                                                                    id,
                                                                    order_id,
                                                                    order_detail_id,
                                                                    activity_id,
                                                                    activity_rule_id,
                                                                    sku_id,
                                                                    create_time
                                                                from order_detail_activity"
    }
    
    
    import_order_detail_coupon(){
        import_data order_detail_coupon "select
                                                                    id,
    								                                                order_id,
                                                                    order_detail_id,
                                                                    coupon_id,
                                                                    coupon_use_id,
                                                                    sku_id,
                                                                    create_time
                                                                from order_detail_coupon"
    }
    
    
    import_refund_payment(){
        import_data refund_payment "select
                                                            id,
                                                            out_trade_no,
                                                            order_id,
                                                            sku_id,
                                                            payment_type,
                                                            trade_no,
                                                            total_amount,
                                                            subject,
                                                            refund_status,
                                                            create_time,
                                                            callback_time
                                                        from refund_payment"                                                    
    
    }
    
    import_sku_attr_value(){
        import_data sku_attr_value "select
                                                        id,
                                                        attr_id,
                                                        value_id,
                                                        sku_id,
                                                        attr_name,
                                                        value_name
                                                    from sku_attr_value"
    }
    
    
    import_sku_sale_attr_value(){
        import_data sku_sale_attr_value "select
                                                                id,
                                                                 sku_id,
                                                                spu_id,
                                                                sale_attr_value_id,
                                                                sale_attr_id,
                                                                sale_attr_name,
                                                                sale_attr_value_name
                                                            from sku_sale_attr_value"
    }
    
    case $1 in
      "order_info")
         import_order_info
    ;;
      "base_category1")
         import_base_category1
    ;;
      "base_category2")
         import_base_category2
    ;;
      "base_category3")
         import_base_category3
    ;;
      "order_detail")
         import_order_detail
    ;;
      "sku_info")
         import_sku_info
    ;;
      "user_info")
         import_user_info
    ;;
      "payment_info")
         import_payment_info
    ;;
      "base_province")
         import_base_province
    ;;
      "base_region")
         import_base_region
    ;;
      "base_trademark")
         import_base_trademark
    ;;
      "activity_info")
          import_activity_info
    ;;
      "cart_info")
          import_cart_info
    ;;
      "comment_info")
          import_comment_info
    ;;
      "coupon_info")
          import_coupon_info
    ;;
     "coupon_use")
          import_coupon_use
    ;;
      "favor_info")
          import_favor_info
    ;;
      "order_refund_info")
          import_order_refund_info
    ;;
      "order_status_log")
          import_order_status_log
    ;;
      "spu_info")
          import_spu_info
    ;;
      "activity_rule")
          import_activity_rule
    ;;
      "base_dic")
          import_base_dic
    ;;
      "order_detail_activity")
          import_order_detail_activity
    ;;
      "order_detail_coupon")
          import_order_detail_coupon
    ;;
      "refund_payment")
          import_refund_payment
    ;;
      "sku_attr_value")
          import_sku_attr_value
    ;;
      "sku_sale_attr_value")
          import_sku_sale_attr_value
    ;;
      "all")
       import_base_category1
       import_base_category2
       import_base_category3
       import_order_info
       import_order_detail
       import_sku_info
       import_user_info
       import_payment_info
       import_base_region
       import_base_province
       import_base_trademark
       import_activity_info
       import_cart_info
       import_comment_info
       import_coupon_use
       import_coupon_info
       import_favor_info
       import_order_refund_info
       import_order_status_log
       import_spu_info
       import_activity_rule
       import_base_dic
       import_order_detail_activity
       import_order_detail_coupon
       import_refund_payment
       import_sku_attr_value
       import_sku_sale_attr_value
    ;;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
    318
    319
    320
    321
    322
    323
    324
    325
    326
    327
    328
    329
    330
    331
    332
    333
    334
    335
    336
    337
    338
    339
    340
    341
    342
    343
    344
    345
    346
    347
    348
    349
    350
    351
    352
    353
    354
    355
    356
    357
    358
    359
    360
    361
    362
    363
    364
    365
    366
    367
    368
    369
    370
    371
    372
    373
    374
    375
    376
    377
    378
    379
    380
    381
    382
    383
    384
    385
    386
    387
    388
    389
    390
    391
    392
    393
    394
    395
    396
    397
    398
    399
    400
    401
    402
    403
    404
    405
    406
    407
    408
    409
    410
    411
    412
    413
    414
    415
    416
    417
    418
    419
    420
    421
    422
    423
    424
    425
    426
    427
    428
    429
    430
    431
    432
    433
    434
    435
    436
    437
    438
    439
    440
    441
    442
    443
    444
    445
    446
    447
    448
    449
    450
    451
    452
    453
    454
    455
    456
    457
    458
    459
    460
    461
    462
    463
    464
    465
    466
    467
    468
    469
    470
    471
    472
    473
    474
    475
    476
    477
    478
    479
    480
    481
    482
    483
    484
    485
    486
    487
    488

    说明1:

​ [ -n 变量值 ] 判断变量的值,是否为空

​ -- 变量的值,非空,返回true

​ -- 变量的值,为空,返回false

说明2:

​ 查看date命令的使用 date --help

  1. 增加脚本执行权限

    chmod +x mysql_to_hdfs_init.sh
    
    1
  2. 使用脚本

    mysql_to_hdfs_init.sh all 2020-06-14
    
    1

# 业务数据每日同步脚本

  1. 脚本编写

    在/home/damoncai/bin目录下创建
    vim mysql_to_hdfs.sh
    
    1
    2
  2. 添加如下内容

    #! /bin/bash
    
    APP=gmall
    sqoop=/opt/module/sqoop/bin/sqoop
    
    if [ -n "$2" ] ;then
        do_date=$2
    else
        do_date=`date -d '-1 day' +%F`
    fi
    
    import_data(){
    $sqoop import \
    --connect jdbc:mysql://ha01:3306/$APP \
    --username root \
    --password 000000 \
    --target-dir /origin_data/$APP/db/$1/$do_date \
    --delete-target-dir \
    --query "$2 and  \$CONDITIONS" \
    --num-mappers 1 \
    --fields-terminated-by '\t' \
    --compress \
    --compression-codec lzop \
    --null-string '\\N' \
    --null-non-string '\\N'
    
    hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /origin_data/$APP/db/$1/$do_date
    }
    
    import_order_info(){
      import_data order_info "select
                                id, 
                                total_amount, 
                                order_status, 
                                user_id, 
                                payment_way,
                                delivery_address,
                                out_trade_no, 
                                create_time, 
                                operate_time,
                                expire_time,
                                tracking_no,
                                province_id,
                                activity_reduce_amount,
                                coupon_reduce_amount,                            
                                original_total_amount,
                                feight_fee,
                                feight_fee_reduce      
                            from order_info
                            where (date_format(create_time,'%Y-%m-%d')='$do_date' 
                            or date_format(operate_time,'%Y-%m-%d')='$do_date')"
    }
    
    import_coupon_use(){
      import_data coupon_use "select
                              id,
                              coupon_id,
                              user_id,
                              order_id,
                              coupon_status,
                              get_time,
                              using_time,
                              used_time,
                              expire_time
                            from coupon_use
                            where (date_format(get_time,'%Y-%m-%d')='$do_date'
                            or date_format(using_time,'%Y-%m-%d')='$do_date'
                            or date_format(used_time,'%Y-%m-%d')='$do_date'
                            or date_format(expire_time,'%Y-%m-%d')='$do_date')"
    }
    
    import_order_status_log(){
      import_data order_status_log "select
    id,
                                      order_id,
                                      order_status,
                                      operate_time
                                    from order_status_log
                                    where date_format(operate_time,'%Y-%m-%d')='$do_date'"
    }
    
    import_user_info(){
      import_data "user_info" "select 
                                id,
                                login_name,
                                nick_name,
                                name,
                                phone_num,
                                email,
                                user_level, 
                                birthday,
                                gender,
                                create_time,
                                operate_time
                              from user_info 
                              where (DATE_FORMAT(create_time,'%Y-%m-%d')='$do_date' 
                              or DATE_FORMAT(operate_time,'%Y-%m-%d')='$do_date')"
    }
    
    import_order_detail(){
      import_data order_detail "select 
                                  id,
                                  order_id, 
                                  sku_id,
                                  sku_name,
                                  order_price,
                                  sku_num, 
                                  create_time,
                                  source_type,
                                  source_id,
                                  split_total_amount,
                                  split_activity_amount,
                                  split_coupon_amount
                                from order_detail 
                                where DATE_FORMAT(create_time,'%Y-%m-%d')='$do_date'"
    }
    
    import_payment_info(){
      import_data "payment_info"  "select 
                                    id,  
                                    out_trade_no, 
                                    order_id, 
                                    user_id, 
                                    payment_type, 
                                    trade_no, 
                                    total_amount,  
                                    subject, 
                                    payment_status,
                                    create_time,
                                    callback_time 
                                  from payment_info 
                                  where (DATE_FORMAT(create_time,'%Y-%m-%d')='$do_date' 
                                  or DATE_FORMAT(callback_time,'%Y-%m-%d')='$do_date')"
    }
    
    import_comment_info(){
      import_data comment_info "select
                                  id,
                                  user_id,
                                  sku_id,
                                  spu_id,
                                  order_id,
                                  appraise,
                                  create_time
                                from comment_info
                                where date_format(create_time,'%Y-%m-%d')='$do_date'"
    }
    
    import_order_refund_info(){
      import_data order_refund_info "select
                                    id,
                                    user_id,
                                    order_id,
                                    sku_id,
                                    refund_type,
                                    refund_num,
                                    refund_amount,
                                    refund_reason_type,
                                    refund_status,
                                    create_time
                                  from order_refund_info
                                  where date_format(create_time,'%Y-%m-%d')='$do_date'"
    }
    
    import_sku_info(){
      import_data sku_info "select 
                              id,
                              spu_id,
                              price,
                              sku_name,
                              sku_desc,
                              weight,
                              tm_id,
                              category3_id,
                              is_sale,
                              create_time
                            from sku_info where 1=1"
    }
    
    import_base_category1(){
      import_data "base_category1" "select 
     id,
                                      name 
                                    from base_category1 where 1=1"
    }
    
    import_base_category2(){
      import_data "base_category2" "select
                                      id,
                                      name,
                                      category1_id 
                                    from base_category2 where 1=1"
    }
    
    import_base_category3(){
      import_data "base_category3" "select
                                      id,
                                      name,
                                      category2_id
                                    from base_category3 where 1=1"
    }
    
    import_base_province(){
      import_data base_province "select
                                  id,
                                  name,
                                  region_id,
                                  area_code,
                                  iso_code,
                                  iso_3166_2
                                from base_province
                                where 1=1"
    }
    
    import_base_region(){
      import_data base_region "select
                                  id,
                                  region_name
                                from base_region
                                where 1=1"
    }
    
    import_base_trademark(){
      import_data base_trademark "select
                                    id,
                                    tm_name
                                  from base_trademark
                                  where 1=1"
    }
    
    import_spu_info(){
      import_data spu_info "select
                                id,
                                spu_name,
                                category3_id,
                                tm_id
                              from spu_info
                              where 1=1"
    }
    
    
    import_favor_info(){
      import_data favor_info "select
                              id,
                              user_id,
                              sku_id,
                              spu_id,
                              is_cancel,
                              create_time,
                              cancel_time
                            from favor_info
                            where 1=1"
    }
    
    import_cart_info(){
      import_data cart_info "select
                            id,
                            user_id,
                            sku_id,
                            cart_price,
                            sku_num,
                            sku_name,
                            create_time,
                            operate_time,
                            is_ordered,
                            order_time,
                            source_type,
                            source_id
                          from cart_info
                          where 1=1"
    }
    
    import_coupon_info(){
      import_data coupon_info "select
                              id,
                              coupon_name,
                              coupon_type,
                              condition_amount,
                              condition_num,
                              activity_id,
                              benefit_amount,
                              benefit_discount,
                              create_time,
                              range_type,
                              limit_num,
                              taken_count,
                              start_time,
                              end_time,
                              operate_time,
                              expire_time
                            from coupon_info
                            where 1=1"
    }
    
    import_activity_info(){
      import_data activity_info "select
                                  id,
                                  activity_name,
    activity_type,
                                  start_time,
                                  end_time,
                                  create_time
                                from activity_info
                                where 1=1"
    }
    
    import_activity_rule(){
        import_data activity_rule "select
                                        id,
                                        activity_id,
                                        activity_type,
                                        condition_amount,
                                        condition_num,
                                        benefit_amount,
                                        benefit_discount,
                                        benefit_level
                                    from activity_rule
                                    where 1=1"
    }
    
    import_base_dic(){
        import_data base_dic "select
                                dic_code,
                                dic_name,
                                parent_code,
                                create_time,
                                operate_time
                              from base_dic
                              where 1=1"
    }
    
    
    import_order_detail_activity(){
        import_data order_detail_activity "select
                                                                    id,
                                                                    order_id,
                                                                    order_detail_id,
                                                                    activity_id,
                                                                    activity_rule_id,
                                                                    sku_id,
                                                                    create_time
                                                                from order_detail_activity
                                                                where date_format(create_time,'%Y-%m-%d')='$do_date'"
    }
    
    
    import_order_detail_coupon(){
        import_data order_detail_coupon "select
                                                                    id,
    								                                                order_id,
    								                                                order_detail_id,
                                                                    coupon_id,
                                                                    coupon_use_id,
                                                                    sku_id,
                                                                    create_time
                                                                from order_detail_coupon
                                                                where date_format(create_time,'%Y-%m-%d')='$do_date'"
    }
    
    
    import_refund_payment(){
        import_data refund_payment "select
                                                            id,
                                                            out_trade_no,
                                                            order_id,
                                                            sku_id,
                                                            payment_type,
                                                            trade_no,
                                                            total_amount,
                                                            subject,
                                                            refund_status,
                                                            create_time,
                                                            callback_time
                                                        from refund_payment
                                                        where (DATE_FORMAT(create_time,'%Y-%m-%d')='$do_date' 
                                                        or DATE_FORMAT(callback_time,'%Y-%m-%d')='$do_date')"                                                    
    
    }
    
    import_sku_attr_value(){
        import_data sku_attr_value "select
                                                        id,
                                                        attr_id,
                                                        value_id,
                                                        sku_id,
                                                        attr_name,
                                                        value_name
                                                    from sku_attr_value
                                                    where 1=1"
    }
    
    
    import_sku_sale_attr_value(){
        import_data sku_sale_attr_value "select
                                                                id,
                                                                sku_id,
                                                                spu_id,
                                                                sale_attr_value_id,
                                                                sale_attr_id,
                                                                sale_attr_name,
                                                                sale_attr_value_name
                                                                from sku_sale_attr_value
                                                            where 1=1"
    }
    
    case $1 in
      "order_info")
         import_order_info
    ;;
      "base_category1")
         import_base_category1
    ;;
      "base_category2")
         import_base_category2
    ;;
      "base_category3")
         import_base_category3
    ;;
      "order_detail")
         import_order_detail
    ;;
      "sku_info")
         import_sku_info
    ;;
      "user_info")
         import_user_info
    ;;
      "payment_info")
         import_payment_info
    ;;
      "base_province")
         import_base_province
    ;;
      "activity_info")
          import_activity_info
    ;;
      "cart_info")
          import_cart_info
    ;;
      "comment_info")
          import_comment_info
    ;;
      "coupon_info")
          import_coupon_info
    ;;
      "coupon_use")
          import_coupon_use
    ;;
      "favor_info")
          import_favor_info
    ;;
      "order_refund_info")
          import_order_refund_info
    ;;
      "order_status_log")
          import_order_status_log
    ;;
      "spu_info")
     import_spu_info
    ;;
      "activity_rule")
          import_activity_rule
    ;;
      "base_dic")
          import_base_dic
    ;;
      "order_detail_activity")
          import_order_detail_activity
    ;;
      "order_detail_coupon")
          import_order_detail_coupon
    ;;
      "refund_payment")
          import_refund_payment
    ;;
      "sku_attr_value")
          import_sku_attr_value
    ;;
      "sku_sale_attr_value")
          import_sku_sale_attr_value
    ;;
    "all")
       import_base_category1
       import_base_category2
       import_base_category3
       import_order_info
       import_order_detail
       import_sku_info
       import_user_info
       import_payment_info
       import_base_trademark
       import_activity_info
       import_cart_info
       import_comment_info
       import_coupon_use
       import_coupon_info
       import_favor_info
       import_order_refund_info
       import_order_status_log
       import_spu_info
       import_activity_rule
       import_base_dic
       import_order_detail_activity
       import_order_detail_coupon
       import_refund_payment
       import_sku_attr_value
       import_sku_sale_attr_value
    ;;
    esac       
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
    318
    319
    320
    321
    322
    323
    324
    325
    326
    327
    328
    329
    330
    331
    332
    333
    334
    335
    336
    337
    338
    339
    340
    341
    342
    343
    344
    345
    346
    347
    348
    349
    350
    351
    352
    353
    354
    355
    356
    357
    358
    359
    360
    361
    362
    363
    364
    365
    366
    367
    368
    369
    370
    371
    372
    373
    374
    375
    376
    377
    378
    379
    380
    381
    382
    383
    384
    385
    386
    387
    388
    389
    390
    391
    392
    393
    394
    395
    396
    397
    398
    399
    400
    401
    402
    403
    404
    405
    406
    407
    408
    409
    410
    411
    412
    413
    414
    415
    416
    417
    418
    419
    420
    421
    422
    423
    424
    425
    426
    427
    428
    429
    430
    431
    432
    433
    434
    435
    436
    437
    438
    439
    440
    441
    442
    443
    444
    445
    446
    447
    448
    449
    450
    451
    452
    453
    454
    455
    456
    457
    458
    459
    460
    461
    462
    463
    464
    465
    466
    467
    468
    469
    470
    471
    472
    473
    474
    475
    476
    477
    478
    479
    480
    481
    482
    483
    484
    485
    486
    487
    488
    489
    490
    491
    492
    493
    494
    495
    496
    497
    498
    499
    500
    501
    502
    503
    504
    505
    506
    507
    508
    509
  3. 增加脚本执行权限

    chmod +x mysql_to_hdfs.sh
    
    1
  4. 脚本使用

    mysql_to_hdfs.sh all 2020-06-15
    
    1

# 项目经验

Hive中的Null在底层是以“\N”来存储,而MySQL中的Null在底层就是Null,为了保证数据两端的一致性。在导出数据时采用--input-null-string和--input-null-non-string两个参数。导入数据时采用--null-string和--null-non-string。

# 数据环境准备

# Hive安装部署

  1. 把apache-hive-3.1.2-bin.tar.gz上传到Linux的/opt/software目录下

  2. 解压apache-hive-3.1.2-bin.tar.gz到/opt/module/目录下面

  3. 修改apache-hive-3.1.2-bin.tar.gz的名称为hive

  4. 修改/etc/profile.d/my_env.sh,添加环境变量

    sudo vim /etc/profile.d/my_env.sh
    
    1
    #HIVE_HOME
    export HIVE_HOME=/opt/module/hive
    export PATH=$PATH:$HIVE_HOME/bin
    
    1
    2
    3
  5. source一下 /etc/profile.d/my_env.sh文件,使环境变量生效

    source /etc/profile.d/my_env.sh
    
    1
  6. 解决日志Jar包冲突,进入/opt/module/hive/lib目录

    mv log4j-slf4j-impl-2.10.0.jar log4j-slf4j-impl-2.10.0.jar.bak
    
    1

# Hive元数据配置到MySQL

  1. 拷贝驱动 (将MySQL的JDBC驱动拷贝到Hive的lib目录下)

    cp /opt/software/mysql-connector-java-5.1.27.jar /opt/module/hive/lib/
    
    1
  2. 配置Metastore到MySQL

    1. 在$HIVE_HOME/conf目录下新建hive-site.xml文件

      vim hive-site.xml
      
      1
    2. 添加如下内容

      <?xml version="1.0"?>
      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
      <configuration>
          <property>
              <name>javax.jdo.option.ConnectionURL</name>
              <value>jdbc:mysql://ha01:3306/metastore?useSSL=false</value>
          </property>
      
          <property>
              <name>javax.jdo.option.ConnectionDriverName</name>
              <value>com.mysql.jdbc.Driver</value>
          </property>
      
          <property>
              <name>javax.jdo.option.ConnectionUserName</name>
              <value>root</value>
          </property>
      
          <property>
              <name>javax.jdo.option.ConnectionPassword</name>
              <value>000000</value>
          </property>
      
          <property>
              <name>hive.metastore.warehouse.dir</name>
              <value>/user/hive/warehouse</value>
          </property>
      
          <property>
              <name>hive.metastore.schema.verification</name>
              <value>false</value>
          </property>
      
          <property>
          <name>hive.server2.thrift.port</name>
          <value>10000</value>
          </property>
      
          <property>
              <name>hive.server2.thrift.bind.host</name>
              <value>ha01</value>
          </property>
      
          <property>
              <name>hive.metastore.event.db.notification.api.auth</name>
              <value>false</value>
          </property>
          
          <property>
              <name>hive.cli.print.header</name>
              <value>true</value>
          </property>
      
          <property>
              <name>hive.cli.print.current.db</name>
              <value>true</value>
          </property>
      </configuration>
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58

# 启动Hive

  1. 登陆MySQL

    mysql -uroot -p000000
    
    1
  2. 新建Hive元数据库

    create database metastore;
    
    1
  3. 初始化Hive元数据库

    schematool -initSchema -dbType mysql -verbose
    
    1
  4. 启动Hive客户端

    bin/hive
    
    1
  5. 查看一下数据库

    show databases;
    
    1

# 数仓分层

# 为什么要分层

# 数据集市与数据仓库概念

# 数仓命名规范

# 表命名

  1. ODS层命名为ods_表名

  2. DIM层命名为dim_表名

  3. DWD层命名为dwd_表名

  4. DWS层命名为dws_表名

  5. DWT层命名为dwt_表名

  6. ADS层命名为ads_表名

  7. 临时表命名为tmp_表名

# 脚本命名

  1. 数据源_to_目标_db/log.sh
  2. 用户行为脚本以log为后缀;业务数据脚本以db为后缀

# 表字段类型

  1. 数量类型为bigint
  2. 金额类型为decimal(16, 2),表示:16位有效数字,其中小数部分2位
  3. 字符串(名字,描述信息等)类型为string
  4. 主键外键类型为string
  5. 时间戳类型为bigint

# 数仓理论

# 范式理论

# 范式概念

  1. 数据建模必须遵循一定的规则,在关系建模中,这种规则就是范式。
  2. 目的:采用范式,可以降低数据的冗余性
  3. 范式的缺点是获取数据时,需要通过Join拼接出最后的数据。
  4. 分类:目前业界范式有:第一范式(1NF)、第二范式(2NF)、第三范式(3NF)、巴斯-科德范式(BCNF)、第四范式(4NF)、第五范式(5NF)

# 函数依赖

# 三范式区分

  1. 第一范式

  2. 第二范式

  3. 第三范式

# 关系建模与维度建模

关系建模和维度建模是两种数据仓库的建模技术。关系建模由Bill Inmon所倡导,维度建模由Ralph Kimball所倡导。

# 关系建模

关系建模将复杂的数据抽象为两个概念——实体和关系,并使用规范化的方式表示出来。关系模型如图所示,从图中可以看出,较为松散、零碎,物理表数量多。

# 维度建模

维度模型如图所示,从图中可以看出,模型相对清晰、简洁。

维度模型以数据分析作为出发点,不遵循三范式,故数据存在一定的冗余。维度模型面向业务,将业务用事实表和维度表呈现出来。表结构简单,故查询简单,查询效率较高。

# 维度表和事实表(重点)

# 维度表

维度表:一般是对事实的描述信息。每一张维表对应现实世界中的一个对象或者概念。 例如:用户、商品、日期、地区等。

维表的特征:

  1. 维表的范围很宽(具有多个属性、列比较多)
  2. 跟事实表相比,行数相对较小:通常< 10万条
  3. 内容相对固定:编码表

# 事实表

事实表中的****每行数据代表一个业务事件(下单、支付、退款、评价等)。“事实”这个术语表示的是业务事件的度量值(可统计次数、个数、金额等),例如,2020年5月21日,宋宋老师在京东花了250块钱买了一瓶海狗人参丸。维度表:时间、用户、商品、商家。事实表:250块钱、一瓶

每一个事实表的行包括:具有可加性的数值型的度量值、与维表相连接的外键,通常具有两个和两个以上的外键。

事实表的特征:

  1. 非常的大
  2. 内容相对的窄:列数较少(主要是外键id和度量值)
  3. 经常发生变化,每天会新增加很多

# 事务型事实表

每个事务或事件为单位,例如一个销售订单记录,一笔支付记录等,作为事实表里的一行数据。一旦事务被提交,事实表数据被插入,数据就不再进行更改,其更新方式为增量更新。

# 周期型快照事实表

周期型快照事实表中不会保留所有数据只保留固定时间间隔的数据,例如每天或者每月的销售额,或每月的账户余额等。

例如购物车,有加减商品,随时都有可能变化,但是我们更关心每天结束时这里面有多少商品,方便我们后期统计分析。

# 累积型快照事实表

**累计快照事实表用于跟踪业务事实的变化。**例如,数据仓库中可能需要累积或者存储订单从下订单开始,到订单商品被打包、运输、和签收的各个业务阶段的时间点数据来跟踪订单声明周期的进展情况。当这个业务过程进行时,事实表的记录也要不断更新。

# 维度模型分类

在维度建模的基础上又分为三种模型:星型模型、雪花模型、星座模型。

# 星型模型

# 雪花型

# 星座型

# 数据仓库建模(绝对重点)

# ODS层

  1. HDFS用户行为数据
  2. HDFS业务数据
  3. 针对HDFS上的用户行为数据和业务数据,我们如何规划处理?
    1. 保持数据原貌不做任何修改,起到备份数据的作用
    2. 数据采用压缩,减少磁盘存储空间(例如:原始数据100G,可以压缩到10G左右)
    3. 创建分区表,防止后续的全表扫描

# DIM层和DWD层

DIM层DWD层需构建维度模型,一般采用星型模型,呈现的状态一般为星座模型。

维度建模一般按照以下四个步骤:

选择业务过程→声明粒度→确认维度→确认事实

  1. 选择业务过程

    在业务系统中,挑选我们感兴趣的业务线,比如下单业务,支付业务,退款业务,物流业务,一条业务线对应一张事实表

  2. 声明粒度

    数据粒度指数据仓库的数据中保存数据的细化程度或综合程度的级别。

    声明粒度意味着精确定义事实表中的一行数据表示什么,应该尽可能选择最小粒度,以此来应各种各样的需求。

    典型的粒度声明如下:

    订单事实表中一行数据表示的是一个订单中的一个商品项。

    支付事实表中一行数据表示的是一个支付记录。

  3. 确定维度

    维度的主要作用是描述业务是事实,主要表示的是“谁,何处,何时”等信息。

    确定维度的原则是:后续需求中是否要分析相关维度的指标。例如,需要统计,什么时间下的订单多,哪个地区下的订单多,哪个用户下的订单多。需要确定的维度就包括:时间维度、地区维度、用户维度。

  4. 确定事实

    此处的“事实”一词,指的是业务中的度量值(次数、个数、件数、金额,可以进行累加),例如订单金额、下单次数等。

    在DWD层,以业务过程为建模驱动,基于每个具体业务过程的特点,构建最细粒度的明细层事实表。事实表可做适当的宽表化处理。

    事实表和维度表的关联比较灵活,但是为了应对更复杂的业务需求,可以将能关联上的表尽量关联上。

​ 至此,数据仓库的维度建模已经完毕,DWD层是以业务过程为驱动。

​ DWS层、DWT层和ADS层都是以需求为驱动,和维度建模已经没有关系了。

​ DWS和DWT都是建宽表,按照主题去建表。主题相当于观察问题的角度。对应着维度表。

# DWS层与DWT层

DWS层和DWT层统称宽表层,这两层的设计思想大致相同,通过以下案例进行阐述。

  1. 问题引出:两个需求,统计每个省份订单的个数、统计每个省份订单的总金额

  2. 处理办法:都是将省份表和订单表进行join,group by省份,然后计算。同样数据被计算了两次,实际上类似的场景还会更多。

  3. 那怎么设计能避免重复计算呢?

  4. 针对上述场景,可以设计一张地区宽表,其主键为地区ID,字段包含为:下单次数、下单金额、支付次数、支付金额等。上述所有指标都统一进行计算,并将结果保存在该宽表中,这样就能有效避免数据的重复计算。

  5. 总结:

    1. 需要建哪些宽表:以维度为基准
    2. 宽表里面的字段:是站在不同维度的角度去看事实表,重点关注事实表聚合后的度量值。
    3. DWS和DWT层的区别:DWS层存放的所有主题对象当天的汇总行为,例如每个地区当天的下单次数,下单金额等,DWT层存放的是所有主题对象的累积行为,例如每个地区最近7天(15天、30天、60天)的下单次数、下单金额等。

# ASDS层

对电商系统各大主题指标分别进行分析。

# 数仓环境搭建

# Hive环境搭建

# Hive引擎简介

Hive引擎包括:默认MR、tez、spark

Hive on Spark:Hive既作为存储元数据又负责SQL的解析优化,语法是HQL语法,执行引擎变成了Spark,Spark负责采用RDD执行。

Spark on Hive : Hive只作为存储元数据,Spark负责SQL解析优化,语法是Spark SQL语法,Spark负责采用RDD执行。

# Hive on Spark 配置

  1. 兼容性说明

    注意:官网下载的Hive3.1.2和Spark3.0.0默认是不兼容的。因为Hive3.1.2支持的Spark版本是2.4.5,所以需要我们重新编译Hive3.1.2版本。

    编译步骤:官网下载Hive3.1.2源码,修改pom文件中引用的Spark版本为3.0.0,如果编译通过,直接打包获取jar包。如果报错,就根据提示,修改相关方法,直到不报错,打包获取jar包。

  2. 在Hive所在节点部署Spark

    如果之前已经部署了Spark,则该步骤可以跳过,但要检查SPARK_HOME的环境变量配置是否正确。

    1. Spark官网下载jar包地址:http://spark.apache.org/downloads.html

    2. 上传并解压解压spark-3.0.0-bin-hadoop3.2.tgz

    3. 配置SPARK_HOME环境变量

      sudo vim /etc/profile.d/my_env.sh
      
      1

      添加如下内容

      # SPARK_HOME
      export SPARK_HOME=/opt/module/spark
      export PATH=$PATH:$SPARK_HOME/bin
      
      1
      2
      3
    4. source 使其生效

      source /etc/profile.d/my_env.sh
      
      1
    5. 在hive中创建spark配置文件

      vim /opt/module/hive/conf/spark-defaults.conf
      
      1
      spark.master                               yarn
      spark.eventLog.enabled                   true
      spark.eventLog.dir                        hdfs://ha01:8020/spark-history
      spark.executor.memory                    1g
      spark.driver.memory					   1g
      
      1
      2
      3
      4
      5
    6. 在HDFS创建如下路径,用于存储历史日志

      hadoop fs -mkdir /spark-history
      
      1
    7. 向HDFS上传Spark纯净版jar包

      说明1:由于Spark3.0.0非纯净版默认支持的是hive2.3.7版本,直接使用会和安装的Hive3.1.2出现兼容性问题。所以采用Spark纯净版jar包,不包含hadoop和hive相关依赖,避免冲突。

      说明2:Hive任务最终由Spark来执行,Spark任务资源分配由Yarn来调度,该任务有可能被分配到集群的任何一个节点。所以需要将Spark的依赖上传到HDFS集群路径,这样集群中任何一个节点都能获取到。

      1. 上传并解压spark-3.0.0-bin-without-hadoop.tgz

        tar -zxvf /opt/software/spark-3.0.0-bin-without-hadoop.tgz
        
        1
      2. 上传Spark纯净版jar包到HDFS

        hadoop fs -mkdir /spark-jars
        
        hadoop fs -put spark-3.0.0-bin-without-hadoop/jars/* /spark-jars
        
        1
        2
        3
    8. 修改hive-site.xml文件

      vim /opt/module/hive/conf/hive-site.xml
      
      1
      <!--Spark依赖位置(注意:端口号8020必须和namenode的端口号一致)-->
      <property>
          <name>spark.yarn.jars</name>
          <value>hdfs://ha01:8020/spark-jars/*</value>
      </property>
        
      <!--Hive执行引擎-->
      <property>
          <name>hive.execution.engine</name>
          <value>spark</value>
      </property>
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11

# Hive on Spark测试

  1. 启动hive客户端

    bin/hive
    
    1
  2. 创建一张测试表

    create table student(id int, name string);
    
    1
  3. 通过insert测试效果

    insert into table student values(1,'abc');
    
    # 第一次执行比较慢,由于需要创建spark session原因
    
    1
    2
    3

# Yarn配置

# 增加ApplicationMaster资源比例

容量调度器对每个资源队列中同时运行的Application Master占用的资源进行了限制,该限制通过yarn.scheduler.capacity.maximum-am-resource-percent参数实现,其默认值是0.1,表示每个资源队列上Application Master最多可使用的资源为该队列总资源的10%,目的是防止大部分资源都被Application Master占用,而导致Map/Reduce Task无法执行。

生产环境该参数可使用默认值。但学习环境,集群资源总数很少,如果只分配10%的资源给Application Master,则可能出现,同一时刻只能运行一个Job的情况,因为一个Application Master使用的资源就可能已经达到10%的上限了。故此处可将该值适当调大。

  1. 在ha01的/opt/module/hadoop-3.1.3/etc/hadoop/capacity-scheduler.xml文件中修改如下参数值

     vim capacity-scheduler.xml
    
    1
    <property>
        <name>yarn.scheduler.capacity.maximum-am-resource-percent</name>
        <value>0.8</value>
    </property
    
    1
    2
    3
    4
  2. 分发capacity-scheduler.xml配置文件

  3. 关闭正在运行的任务,重新启动yarn集群

    sbin/stop-yarn.sh
    sbin/start-yarn.sh
    
    1
    2

# 数仓开发环境

数仓开发工具可选用DBeaver或者DataGrip。两者都需要用到JDBC协议连接到Hive,故需要启动HiveServer2。

  1. 启动HiveServer2

    hiveserver2
    
    1
  2. 配置DataGrip连接

    1. 创建链接

    2. 配置连接属性

      所有属性配置,和Hive的beeline客户端配置一致即可。初次使用,配置过程会提示缺少JDBC驱动,按照提示下载即可

  1. 修改连接,指明连接数据库

# 数据准备

一般企业在搭建数仓时,业务系统中会存在一定的历史数据,此处为模拟真实场景,需准备若干历史数据。假定数仓上线的日期为2020-06-14,具体说明如下。

# 用户行为日志

用户行为日志,一般是没有历史数据的,故日志只需要准备2020-06-14一天的数据。具体操作如下:

  1. 启动日志采集通道,包括Flume、Kafak等
  2. 修改两个日志服务器(hadoop102、hadoop103)中的/opt/module/applog/application.yml配置文件,将mock.date参数改为2020-06-14。
  3. 执行日志生成脚本lg.sh。
  4. 观察HDFS是否出现相应文件。

# 业务数据

业务数据一般存在历史数据,此处需准备2020-06-10至2020-06-14的数据。具体操作如下。

  1. 修改ha01节点上的/opt/module/db_log/application.properties文件,将mock.date、mock.clear,mock.clear.user三个参数调整为如图所示的值。

  2. 执行模拟生成业务数据的命令,生成第一天2020-06-10的历史数据

    java -jar gmall2020-mock-db-2021-01-22.jar
    
    1
  3. 修改/opt/module/db_log/application.properties文件,将mock.date、mock.clear,mock.clear.user三个参数调整为如图所示的值。

  4. 执行模拟生成业务数据的命令,生成第二天2020-06-11的历史数据。

    java -jar gmall2020-mock-db-2021-01-22.jar
    
    1
  5. 之后只修改/opt/module/db_log/application.properties文件中的mock.date参数,依次改为2020-06-12,2020-06-13,2020-06-14,并分别生成对应日期的数据。

  6. 执行mysql_to_hdfs_init.sh脚本,将模拟生成的业务数据同步到HDFS

    mysql_to_hdfs_init.sh all 2020-06-14
    
    1
  7. 观察HDFS上是否出现相应的数据

# 数仓搭建-ODS层

  1. 保持数据原貌不做任何修改,起到备份数据的作用。
  2. 数据采用LZO压缩,减少磁盘存储空间。100G数据可以压缩到10G以内。
  3. 创建分区表,防止后续的全表扫描,在企业开发中大量使用分区表。
  4. 创建外部表。在企业开发中,除了自己用的临时表,创建内部表外,绝大多数场景都是创建外部表。

# ODS层(用户行为数据)

# 创建日志表ods_log

  1. 创建支持lzo压缩的分区表

    drop table if exists ods_log;
    CREATE EXTERNAL TABLE ods_log (`line` string)
    PARTITIONED BY (`dt` string) -- 按照时间创建分区
    STORED AS -- 指定存储方式,读数据采用LzoTextInputFormat;
      INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
      OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
    LOCATION '/warehouse/gmall/ods/ods_log'  -- 指定数据在hdfs上的存储位置
    ;
    
    1
    2
    3
    4
    5
    6
    7
    8

    说明Hive的LZO压缩:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZO

  2. 分区规划

  3. 加载数据

    load data inpath '/origin_data/gmall/log/topic_log/2020-06-14' into table ods_log partition(dt='2020-06-14');
    
    1
  4. 为lzo压缩文件创建索引

    hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/gmall/ods/ods_log/dt=2020-06-14
    
    1

# ODS层日志表加载数据脚本

  1. 在ha01的/home/damoncai/bin目录下创建脚本

    vim hdfs_to_ods_log.sh
    
    1
    #!/bin/bash
    
    # 定义变量方便修改
    APP=gmall
    
    # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
    if [ -n "$1" ] ;then
       do_date=$1
    else 
       do_date=`date -d "-1 day" +%F`
    fi 
    
    echo ================== 日志日期为 $do_date ==================
    sql="
    load data inpath '/origin_data/$APP/log/topic_log/$do_date' into table ${APP}.ods_log partition(dt='$do_date');
    "
    
    hive -e "$sql"
    
    hadoop jar /opt/module/hadoop-3.1.3/share/hadoop/common/hadoop-lzo-0.4.20.jar com.hadoop.compression.lzo.DistributedLzoIndexer /warehouse/$APP/ods/ods_log/dt=$do_date
    
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    1. 说明1:

      [ -n 变量值 ] 判断变量的值,是否为空

      -- 变量的值,非空,返回true

      -- 变量的值,为空,返回false

    2. 说明2:

      查看date命令的使用,date --help

    3. 添加脚本执行权限

      chmod 777 hdfs_to_ods_log.sh
      
      1
    4. 使用脚本

      hdfs_to_ods_log.sh 2020-06-14
      
      1
    5. 查看导入数据

# ODS层(业务数据)

ODS层业务表分区规划如下

ODS层业务表数据装载思路如下

# Hive中创建表

DROP TABLE IF EXISTS ods_activity_info;
CREATE EXTERNAL TABLE ods_activity_info(
    `id` STRING COMMENT '编号',
    `activity_name` STRING  COMMENT '活动名称',
    `activity_type` STRING  COMMENT '活动类型',
    `start_time` STRING  COMMENT '开始时间',
    `end_time` STRING  COMMENT '结束时间',
    `create_time` STRING  COMMENT '创建时间'
) COMMENT '活动信息表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_activity_info/';

DROP TABLE IF EXISTS ods_activity_rule;
CREATE EXTERNAL TABLE ods_activity_rule(
    `id` STRING COMMENT '编号',
    `activity_id` STRING  COMMENT '活动ID',
    `activity_type` STRING COMMENT '活动类型',
    `condition_amount` DECIMAL(16,2) COMMENT '满减金额',
    `condition_num` BIGINT COMMENT '满减件数',
    `benefit_amount` DECIMAL(16,2) COMMENT '优惠金额',
    `benefit_discount` DECIMAL(16,2) COMMENT '优惠折扣',
    `benefit_level` STRING COMMENT '优惠级别'
) COMMENT '活动规则表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_activity_rule/';

DROP TABLE IF EXISTS ods_base_category1;
CREATE EXTERNAL TABLE ods_base_category1(
    `id` STRING COMMENT 'id',
    `name` STRING COMMENT '名称'
) COMMENT '商品一级分类表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_category1/';

DROP TABLE IF EXISTS ods_base_category2;
CREATE EXTERNAL TABLE ods_base_category2(
    `id` STRING COMMENT ' id',
    `name` STRING COMMENT '名称',
    `category1_id` STRING COMMENT '一级品类id'
) COMMENT '商品二级分类表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_category2/';

DROP TABLE IF EXISTS ods_base_category3;
CREATE EXTERNAL TABLE ods_base_category3(
    `id` STRING COMMENT ' id',
    `name` STRING COMMENT '名称',
    `category2_id` STRING COMMENT '二级品类id'
) COMMENT '商品三级分类表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_category3/';

DROP TABLE IF EXISTS ods_base_dic;
CREATE EXTERNAL TABLE ods_base_dic(
    `dic_code` STRING COMMENT '编号',
    `dic_name` STRING COMMENT '编码名称',
    `parent_code` STRING COMMENT '父编码',
    `create_time` STRING COMMENT '创建日期',
    `operate_time` STRING COMMENT '操作日期'
) COMMENT '编码字典表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_dic/';

DROP TABLE IF EXISTS ods_base_province;
CREATE EXTERNAL TABLE ods_base_province (
    `id` STRING COMMENT '编号',
    `name` STRING COMMENT '省份名称',
    `region_id` STRING COMMENT '地区ID',
    `area_code` STRING COMMENT '地区编码',
    `iso_code` STRING COMMENT 'ISO-3166编码,供可视化使用',
    `iso_3166_2` STRING COMMENT 'IOS-3166-2编码,供可视化使用'
)  COMMENT '省份表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_province/';

DROP TABLE IF EXISTS ods_base_region;
CREATE EXTERNAL TABLE ods_base_region (
    `id` STRING COMMENT '编号',
    `region_name` STRING COMMENT '地区名称'
)  COMMENT '地区表'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_region/';

DROP TABLE IF EXISTS ods_base_trademark;
CREATE EXTERNAL TABLE ods_base_trademark (
    `id` STRING COMMENT '编号',
    `tm_name` STRING COMMENT '品牌名称'
)  COMMENT '品牌表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_base_trademark/';

DROP TABLE IF EXISTS ods_cart_info;
CREATE EXTERNAL TABLE ods_cart_info(
    `id` STRING COMMENT '编号',
    `user_id` STRING COMMENT '用户id',
    `sku_id` STRING COMMENT 'skuid',
    `cart_price` DECIMAL(16,2)  COMMENT '放入购物车时价格',
    `sku_num` BIGINT COMMENT '数量',
    `sku_name` STRING COMMENT 'sku名称 (冗余)',
    `create_time` STRING COMMENT '创建时间',
    `operate_time` STRING COMMENT '修改时间',
    `is_ordered` STRING COMMENT '是否已经下单',
    `order_time` STRING COMMENT '下单时间',
    `source_type` STRING COMMENT '来源类型',
    `source_id` STRING COMMENT '来源编号'
) COMMENT '加购表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_cart_info/';

DROP TABLE IF EXISTS ods_comment_info;
CREATE EXTERNAL TABLE ods_comment_info(
    `id` STRING COMMENT '编号',
    `user_id` STRING COMMENT '用户ID',
    `sku_id` STRING COMMENT '商品sku',
    `spu_id` STRING COMMENT '商品spu',
    `order_id` STRING COMMENT '订单ID',
    `appraise` STRING COMMENT '评价',
    `create_time` STRING COMMENT '评价时间'
) COMMENT '商品评论表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_comment_info/';

DROP TABLE IF EXISTS ods_coupon_info;
CREATE EXTERNAL TABLE ods_coupon_info(
    `id` STRING COMMENT '购物券编号',
    `coupon_name` STRING COMMENT '购物券名称',
    `coupon_type` STRING COMMENT '购物券类型 1 现金券 2 折扣券 3 满减券 4 满件打折券',
    `condition_amount` DECIMAL(16,2) COMMENT '满额数',
    `condition_num` BIGINT COMMENT '满件数',
    `activity_id` STRING COMMENT '活动编号',
    `benefit_amount` DECIMAL(16,2) COMMENT '减金额',
    `benefit_discount` DECIMAL(16,2) COMMENT '折扣',
    `create_time` STRING COMMENT '创建时间',
    `range_type` STRING COMMENT '范围类型 1、商品 2、品类 3、品牌',
    `limit_num` BIGINT COMMENT '最多领用次数',
    `taken_count` BIGINT COMMENT '已领用次数',
    `start_time` STRING COMMENT '开始领取时间',
    `end_time` STRING COMMENT '结束领取时间',
    `operate_time` STRING COMMENT '修改时间',
    `expire_time` STRING COMMENT '过期时间'
) COMMENT '优惠券表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_coupon_info/';

DROP TABLE IF EXISTS ods_coupon_use;
CREATE EXTERNAL TABLE ods_coupon_use(
    `id` STRING COMMENT '编号',
    `coupon_id` STRING  COMMENT '优惠券ID',
    `user_id` STRING  COMMENT 'skuid',
    `order_id` STRING  COMMENT 'spuid',
    `coupon_status` STRING  COMMENT '优惠券状态',
    `get_time` STRING  COMMENT '领取时间',
    `using_time` STRING  COMMENT '使用时间(下单)',
    `used_time` STRING  COMMENT '使用时间(支付)',
    `expire_time` STRING COMMENT '过期时间'
) COMMENT '优惠券领用表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_coupon_use/';

DROP TABLE IF EXISTS ods_favor_info;
CREATE EXTERNAL TABLE ods_favor_info(
    `id` STRING COMMENT '编号',
    `user_id` STRING COMMENT '用户id',
    `sku_id` STRING COMMENT 'skuid',
    `spu_id` STRING COMMENT 'spuid',
    `is_cancel` STRING COMMENT '是否取消',
    `create_time` STRING COMMENT '收藏时间',
    `cancel_time` STRING COMMENT '取消时间'
) COMMENT '商品收藏表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_favor_info/';

DROP TABLE IF EXISTS ods_order_detail;
CREATE EXTERNAL TABLE ods_order_detail(
    `id` STRING COMMENT '编号',
    `order_id` STRING  COMMENT '订单号',
    `sku_id` STRING COMMENT '商品id',
    `sku_name` STRING COMMENT '商品名称',
    `order_price` DECIMAL(16,2) COMMENT '商品价格',
    `sku_num` BIGINT COMMENT '商品数量',
    `create_time` STRING COMMENT '创建时间',
    `source_type` STRING COMMENT '来源类型',
    `source_id` STRING COMMENT '来源编号',
    `split_final_amount` DECIMAL(16,2) COMMENT '分摊最终金额',
    `split_activity_amount` DECIMAL(16,2) COMMENT '分摊活动优惠',
    `split_coupon_amount` DECIMAL(16,2) COMMENT '分摊优惠券优惠'
) COMMENT '订单详情表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_detail/';

DROP TABLE IF EXISTS ods_order_detail_activity;
CREATE EXTERNAL TABLE ods_order_detail_activity(
    `id` STRING COMMENT '编号',
    `order_id` STRING  COMMENT '订单号',
    `order_detail_id` STRING COMMENT '订单明细id',
    `activity_id` STRING COMMENT '活动id',
    `activity_rule_id` STRING COMMENT '活动规则id',
    `sku_id` BIGINT COMMENT '商品id',
    `create_time` STRING COMMENT '创建时间'
) COMMENT '订单详情活动关联表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_detail_activity/';

DROP TABLE IF EXISTS ods_order_detail_coupon;
CREATE EXTERNAL TABLE ods_order_detail_coupon(
    `id` STRING COMMENT '编号',
    `order_id` STRING  COMMENT '订单号',
    `order_detail_id` STRING COMMENT '订单明细id',
    `coupon_id` STRING COMMENT '优惠券id',
    `coupon_use_id` STRING COMMENT '优惠券领用记录id',
    `sku_id` STRING COMMENT '商品id',
    `create_time` STRING COMMENT '创建时间'
) COMMENT '订单详情活动关联表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_detail_coupon/';

DROP TABLE IF EXISTS ods_order_info;
CREATE EXTERNAL TABLE ods_order_info (
    `id` STRING COMMENT '订单号',
    `final_amount` DECIMAL(16,2) COMMENT '订单最终金额',
    `order_status` STRING COMMENT '订单状态',
    `user_id` STRING COMMENT '用户id',
    `payment_way` STRING COMMENT '支付方式',
    `delivery_address` STRING COMMENT '送货地址',
    `out_trade_no` STRING COMMENT '支付流水号',
    `create_time` STRING COMMENT '创建时间',
    `operate_time` STRING COMMENT '操作时间',
    `expire_time` STRING COMMENT '过期时间',
    `tracking_no` STRING COMMENT '物流单编号',
    `province_id` STRING COMMENT '省份ID',
    `activity_reduce_amount` DECIMAL(16,2) COMMENT '活动减免金额',
    `coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠券减免金额',
    `original_amount` DECIMAL(16,2)  COMMENT '订单原价金额',
    `feight_fee` DECIMAL(16,2)  COMMENT '运费',
    `feight_fee_reduce` DECIMAL(16,2)  COMMENT '运费减免'
) COMMENT '订单表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_info/';

DROP TABLE IF EXISTS ods_order_refund_info;
CREATE EXTERNAL TABLE ods_order_refund_info(
    `id` STRING COMMENT '编号',
    `user_id` STRING COMMENT '用户ID',
    `order_id` STRING COMMENT '订单ID',
    `sku_id` STRING COMMENT '商品ID',
    `refund_type` STRING COMMENT '退单类型',
    `refund_num` BIGINT COMMENT '退单件数',
    `refund_amount` DECIMAL(16,2) COMMENT '退单金额',
    `refund_reason_type` STRING COMMENT '退单原因类型',
    `refund_status` STRING COMMENT '退单状态',--退单状态应包含买家申请、卖家审核、卖家收货、退款完成等状态。此处未涉及到,故该表按增量处理
    `create_time` STRING COMMENT '退单时间'
) COMMENT '退单表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_refund_info/';

DROP TABLE IF EXISTS ods_order_status_log;
CREATE EXTERNAL TABLE ods_order_status_log (
    `id` STRING COMMENT '编号',
    `order_id` STRING COMMENT '订单ID',
    `order_status` STRING COMMENT '订单状态',
    `operate_time` STRING COMMENT '修改时间'
)  COMMENT '订单状态表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_order_status_log/';

DROP TABLE IF EXISTS ods_payment_info;
CREATE EXTERNAL TABLE ods_payment_info(
    `id` STRING COMMENT '编号',
    `out_trade_no` STRING COMMENT '对外业务编号',
    `order_id` STRING COMMENT '订单编号',
    `user_id` STRING COMMENT '用户编号',
    `payment_type` STRING COMMENT '支付类型',
    `trade_no` STRING COMMENT '交易编号',
    `payment_amount` DECIMAL(16,2) COMMENT '支付金额',
    `subject` STRING COMMENT '交易内容',
    `payment_status` STRING COMMENT '支付状态',
    `create_time` STRING COMMENT '创建时间',
    `callback_time` STRING COMMENT '回调时间'
)  COMMENT '支付流水表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_payment_info/';

DROP TABLE IF EXISTS ods_refund_payment;
CREATE EXTERNAL TABLE ods_refund_payment(
    `id` STRING COMMENT '编号',
    `out_trade_no` STRING COMMENT '对外业务编号',
    `order_id` STRING COMMENT '订单编号',
    `sku_id` STRING COMMENT 'SKU编号',
    `payment_type` STRING COMMENT '支付类型',
    `trade_no` STRING COMMENT '交易编号',
    `refund_amount` DECIMAL(16,2) COMMENT '支付金额',
    `subject` STRING COMMENT '交易内容',
    `refund_status` STRING COMMENT '支付状态',
    `create_time` STRING COMMENT '创建时间',
    `callback_time` STRING COMMENT '回调时间'
)  COMMENT '支付流水表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_refund_payment/';

DROP TABLE IF EXISTS ods_sku_attr_value;
CREATE EXTERNAL TABLE ods_sku_attr_value(
    `id` STRING COMMENT '编号',
    `attr_id` STRING COMMENT '平台属性ID',
    `value_id` STRING COMMENT '平台属性值ID',
    `sku_id` STRING COMMENT '商品ID',
    `attr_name` STRING COMMENT '平台属性名称',
    `value_name` STRING COMMENT '平台属性值名称'
) COMMENT 'sku平台属性表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_sku_attr_value/';

DROP TABLE IF EXISTS ods_sku_info;
CREATE EXTERNAL TABLE ods_sku_info(
    `id` STRING COMMENT 'skuId',
    `spu_id` STRING COMMENT 'spuid',
    `price` DECIMAL(16,2) COMMENT '价格',
    `sku_name` STRING COMMENT '商品名称',
    `sku_desc` STRING COMMENT '商品描述',
    `weight` DECIMAL(16,2) COMMENT '重量',
    `tm_id` STRING COMMENT '品牌id',
    `category3_id` STRING COMMENT '品类id',
    `is_sale` STRING COMMENT '是否在售',
    `create_time` STRING COMMENT '创建时间'
) COMMENT 'SKU商品表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_sku_info/';

DROP TABLE IF EXISTS ods_sku_sale_attr_value;
CREATE EXTERNAL TABLE ods_sku_sale_attr_value(
    `id` STRING COMMENT '编号',
    `sku_id` STRING COMMENT 'sku_id',
    `spu_id` STRING COMMENT 'spu_id',
    `sale_attr_value_id` STRING COMMENT '销售属性值id',
    `sale_attr_id` STRING COMMENT '销售属性id',
    `sale_attr_name` STRING COMMENT '销售属性名称',
    `sale_attr_value_name` STRING COMMENT '销售属性值名称'
) COMMENT 'sku销售属性名称'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_sku_sale_attr_value/';

DROP TABLE IF EXISTS ods_spu_info;
CREATE EXTERNAL TABLE ods_spu_info(
    `id` STRING COMMENT 'spuid',
    `spu_name` STRING COMMENT 'spu名称',
    `category3_id` STRING COMMENT '品类id',
    `tm_id` STRING COMMENT '品牌id'
) COMMENT 'SPU商品表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_spu_info/';

DROP TABLE IF EXISTS ods_user_info;
CREATE EXTERNAL TABLE ods_user_info(
    `id` STRING COMMENT '用户id',
    `login_name` STRING COMMENT '用户名称',
    `nick_name` STRING COMMENT '用户昵称',
    `name` STRING COMMENT '用户姓名',
    `phone_num` STRING COMMENT '手机号码',
    `email` STRING COMMENT '邮箱',
    `user_level` STRING COMMENT '用户等级',
    `birthday` STRING COMMENT '生日',
    `gender` STRING COMMENT '性别',
    `create_time` STRING COMMENT '创建时间',
    `operate_time` STRING COMMENT '操作时间'
) COMMENT '用户表'
PARTITIONED BY (`dt` STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
STORED AS
  INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/warehouse/gmall/ods/ods_user_info/';
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472

# ODS层业务表首日数据装载脚本

  1. 编写脚本

  2. 在/home/damoncai/bin目录下创建脚本hdfs_to_ods_db_init.sh

    vim hdfs_to_ods_db_init.sh
    
    1
    #!/bin/bash
    
    APP=gmall
    
    if [ -n "$2" ] ;then
       do_date=$2
    else 
       echo "请传入日期参数"
       exit
    fi 
    
    ods_order_info=" 
    load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date');"
    
    ods_order_detail="
    load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date');"
    
    ods_sku_info="
    load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date');"
    
    ods_user_info="
    load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date');"
    
    ods_payment_info="
    load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date');"
    
    ods_base_category1="
    load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date');"
    
    ods_base_category2="
    load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date');"
    
    ods_base_category3="
    load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date'); "
    
    ods_base_trademark="
    load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date'); "
    
    ods_activity_info="
    load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date'); "
    
    ods_cart_info="
    load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date'); "
    
    ods_comment_info="
    load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date'); "
    
    ods_coupon_info="
    load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date'); "
    
    ods_coupon_use="
    load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date'); "
    
    ods_favor_info="
    load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date'); "
    
    ods_order_refund_info="
    load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date'); "
    
    ods_order_status_log="
    load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date'); "
    
    ods_spu_info="
    load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date'); "
    
    ods_activity_rule="
    load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date');" 
    
    ods_base_dic="
    load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date'); "
    
    ods_order_detail_activity="
    load data inpath '/origin_data/$APP/db/order_detail_activity/$do_date' OVERWRITE into table ${APP}.ods_order_detail_activity partition(dt='$do_date'); "
    
    ods_order_detail_coupon="
    load data inpath '/origin_data/$APP/db/order_detail_coupon/$do_date' OVERWRITE into table ${APP}.ods_order_detail_coupon partition(dt='$do_date'); "
    
    ods_refund_payment="
    load data inpath '/origin_data/$APP/db/refund_payment/$do_date' OVERWRITE into table ${APP}.ods_refund_payment partition(dt='$do_date'); "
    
    ods_sku_attr_value="
    load data inpath '/origin_data/$APP/db/sku_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_attr_value partition(dt='$do_date'); "
    
    ods_sku_sale_attr_value="
    load data inpath '/origin_data/$APP/db/sku_sale_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_sale_attr_value partition(dt='$do_date'); "
    
    ods_base_province=" 
    load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province;"
    
    ods_base_region="
    load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region;"
    
    case $1 in
        "ods_order_info"){
            hive -e "$ods_order_info"
        };;
        "ods_order_detail"){
            hive -e "$ods_order_detail"
        };;
        "ods_sku_info"){
            hive -e "$ods_sku_info"
        };;
        "ods_user_info"){
            hive -e "$ods_user_info"
        };;
        "ods_payment_info"){
            hive -e "$ods_payment_info"
        };;
        "ods_base_category1"){
            hive -e "$ods_base_category1"
        };;
        "ods_base_category2"){
            hive -e "$ods_base_category2"
        };;
        "ods_base_category3"){
            hive -e "$ods_base_category3"
        };;
        "ods_base_trademark"){
            hive -e "$ods_base_trademark"
        };;
        "ods_activity_info"){
            hive -e "$ods_activity_info"
        };;
        "ods_cart_info"){
            hive -e "$ods_cart_info"
        };;
        "ods_comment_info"){
            hive -e "$ods_comment_info"
        };;
        "ods_coupon_info"){
            hive -e "$ods_coupon_info"
        };;
        "ods_coupon_use"){
            hive -e "$ods_coupon_use"
        };;
        "ods_favor_info"){
            hive -e "$ods_favor_info"
        };;
        "ods_order_refund_info"){
            hive -e "$ods_order_refund_info"
        };;
        "ods_order_status_log"){
            hive -e "$ods_order_status_log"
        };;
        "ods_spu_info"){
            hive -e "$ods_spu_info"
        };;
        "ods_activity_rule"){
            hive -e "$ods_activity_rule"
        };;
        "ods_base_dic"){
            hive -e "$ods_base_dic"
        };;
        "ods_order_detail_activity"){
            hive -e "$ods_order_detail_activity"
        };;
        "ods_order_detail_coupon"){
            hive -e "$ods_order_detail_coupon"
        };;
        "ods_refund_payment"){
            hive -e "$ods_refund_payment"
        };;
        "ods_sku_attr_value"){
            hive -e "$ods_sku_attr_value"
        };;
        "ods_sku_sale_attr_value"){
            hive -e "$ods_sku_sale_attr_value"
        };;
        "ods_base_province"){
            hive -e "$ods_base_province"
        };;
        "ods_base_region"){
            hive -e "$ods_base_region"
        };;
        "all"){
            hive -e "$ods_order_info$ods_order_detail$ods_sku_info$ods_user_info$ods_payment_info$ods_base_category1$ods_base_category2$ods_base_category3$ods_base_trademark$ods_activity_info$ods_cart_info$ods_comment_info$ods_coupon_info$ods_coupon_use$ods_favor_info$ods_order_refund_info$ods_order_status_log$ods_spu_info$ods_activity_rule$ods_base_dic$ods_order_detail_activity$ods_order_detail_coupon$ods_refund_payment$ods_sku_attr_value$ods_sku_sale_attr_value$ods_base_province$ods_base_region"
        };;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
  3. 增加执行权限

  4. 执行脚本

    hdfs_to_ods_db_init.sh all 2020-06-14
    
    1

# ODS层业务表每日数据装载脚本

  1. 编写脚本

  2. 在/home/damoncai/bin目录下创建脚本hdfs_to_ods_db.sh

    vim hdfs_to_ods_db.sh
    
    1
    #!/bin/bash
    
    APP=gmall
    
    # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
    if [ -n "$2" ] ;then
        do_date=$2
    else 
        do_date=`date -d "-1 day" +%F`
    fi
    
    ods_order_info=" 
    load data inpath '/origin_data/$APP/db/order_info/$do_date' OVERWRITE into table ${APP}.ods_order_info partition(dt='$do_date');"
    
    ods_order_detail="
    load data inpath '/origin_data/$APP/db/order_detail/$do_date' OVERWRITE into table ${APP}.ods_order_detail partition(dt='$do_date');"
    
    ods_sku_info="
    load data inpath '/origin_data/$APP/db/sku_info/$do_date' OVERWRITE into table ${APP}.ods_sku_info partition(dt='$do_date');"
    
    ods_user_info="
    load data inpath '/origin_data/$APP/db/user_info/$do_date' OVERWRITE into table ${APP}.ods_user_info partition(dt='$do_date');"
    
    ods_payment_info="
    load data inpath '/origin_data/$APP/db/payment_info/$do_date' OVERWRITE into table ${APP}.ods_payment_info partition(dt='$do_date');"
    
    ods_base_category1="
    load data inpath '/origin_data/$APP/db/base_category1/$do_date' OVERWRITE into table ${APP}.ods_base_category1 partition(dt='$do_date');"
    
    ods_base_category2="
    load data inpath '/origin_data/$APP/db/base_category2/$do_date' OVERWRITE into table ${APP}.ods_base_category2 partition(dt='$do_date');"
    
    ods_base_category3="
    load data inpath '/origin_data/$APP/db/base_category3/$do_date' OVERWRITE into table ${APP}.ods_base_category3 partition(dt='$do_date'); "
    
    ods_base_trademark="
    load data inpath '/origin_data/$APP/db/base_trademark/$do_date' OVERWRITE into table ${APP}.ods_base_trademark partition(dt='$do_date'); "
    
    ods_activity_info="
    load data inpath '/origin_data/$APP/db/activity_info/$do_date' OVERWRITE into table ${APP}.ods_activity_info partition(dt='$do_date'); "
    
    ods_cart_info="
    load data inpath '/origin_data/$APP/db/cart_info/$do_date' OVERWRITE into table ${APP}.ods_cart_info partition(dt='$do_date'); "
    
    ods_comment_info="
    load data inpath '/origin_data/$APP/db/comment_info/$do_date' OVERWRITE into table ${APP}.ods_comment_info partition(dt='$do_date'); "
    
    ods_coupon_info="
    load data inpath '/origin_data/$APP/db/coupon_info/$do_date' OVERWRITE into table ${APP}.ods_coupon_info partition(dt='$do_date'); "
    
    ods_coupon_use="
    load data inpath '/origin_data/$APP/db/coupon_use/$do_date' OVERWRITE into table ${APP}.ods_coupon_use partition(dt='$do_date'); "
    
    ods_favor_info="
    load data inpath '/origin_data/$APP/db/favor_info/$do_date' OVERWRITE into table ${APP}.ods_favor_info partition(dt='$do_date'); "
    
    ods_order_refund_info="
    load data inpath '/origin_data/$APP/db/order_refund_info/$do_date' OVERWRITE into table ${APP}.ods_order_refund_info partition(dt='$do_date'); "
    
    ods_order_status_log="
    load data inpath '/origin_data/$APP/db/order_status_log/$do_date' OVERWRITE into table ${APP}.ods_order_status_log partition(dt='$do_date'); "
    
    ods_spu_info="
    load data inpath '/origin_data/$APP/db/spu_info/$do_date' OVERWRITE into table ${APP}.ods_spu_info partition(dt='$do_date'); "
    
    ods_activity_rule="
    load data inpath '/origin_data/$APP/db/activity_rule/$do_date' OVERWRITE into table ${APP}.ods_activity_rule partition(dt='$do_date');" 
    
    ods_base_dic="
    load data inpath '/origin_data/$APP/db/base_dic/$do_date' OVERWRITE into table ${APP}.ods_base_dic partition(dt='$do_date'); "
    
    ods_order_detail_activity="
    load data inpath '/origin_data/$APP/db/order_detail_activity/$do_date' OVERWRITE into table ${APP}.ods_order_detail_activity partition(dt='$do_date'); "
    
    ods_order_detail_coupon="
    load data inpath '/origin_data/$APP/db/order_detail_coupon/$do_date' OVERWRITE into table ${APP}.ods_order_detail_coupon partition(dt='$do_date'); "
    
    ods_refund_payment="
    load data inpath '/origin_data/$APP/db/refund_payment/$do_date' OVERWRITE into table ${APP}.ods_refund_payment partition(dt='$do_date'); "
    
    ods_sku_attr_value="
    load data inpath '/origin_data/$APP/db/sku_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_attr_value partition(dt='$do_date'); "
    
    ods_sku_sale_attr_value="
    load data inpath '/origin_data/$APP/db/sku_sale_attr_value/$do_date' OVERWRITE into table ${APP}.ods_sku_sale_attr_value partition(dt='$do_date'); "
    
    ods_base_province=" 
    load data inpath '/origin_data/$APP/db/base_province/$do_date' OVERWRITE into table ${APP}.ods_base_province;"
    
    ods_base_region="
    load data inpath '/origin_data/$APP/db/base_region/$do_date' OVERWRITE into table ${APP}.ods_base_region;"
    
    case $1 in
        "ods_order_info"){
            hive -e "$ods_order_info"
        };;
        "ods_order_detail"){
            hive -e "$ods_order_detail"
        };;
        "ods_sku_info"){
            hive -e "$ods_sku_info"
        };;
        "ods_user_info"){
            hive -e "$ods_user_info"
        };;
        "ods_payment_info"){
            hive -e "$ods_payment_info"
        };;
        "ods_base_category1"){
            hive -e "$ods_base_category1"
        };;
        "ods_base_category2"){
            hive -e "$ods_base_category2"
        };;
        "ods_base_category3"){
            hive -e "$ods_base_category3"
        };;
        "ods_base_trademark"){
            hive -e "$ods_base_trademark"
        };;
        "ods_activity_info"){
            hive -e "$ods_activity_info"
        };;
        "ods_cart_info"){
            hive -e "$ods_cart_info"
        };;
        "ods_comment_info"){
            hive -e "$ods_comment_info"
        };;
        "ods_coupon_info"){
            hive -e "$ods_coupon_info"
        };;
        "ods_coupon_use"){
            hive -e "$ods_coupon_use"
        };;
        "ods_favor_info"){
            hive -e "$ods_favor_info"
        };;
        "ods_order_refund_info"){
            hive -e "$ods_order_refund_info"
        };;
        "ods_order_status_log"){
            hive -e "$ods_order_status_log"
        };;
        "ods_spu_info"){
            hive -e "$ods_spu_info"
        };;
        "ods_activity_rule"){
            hive -e "$ods_activity_rule"
        };;
        "ods_base_dic"){
            hive -e "$ods_base_dic"
        };;
        "ods_order_detail_activity"){
            hive -e "$ods_order_detail_activity"
        };;
        "ods_order_detail_coupon"){
            hive -e "$ods_order_detail_coupon"
        };;
        "ods_refund_payment"){
            hive -e "$ods_refund_payment"
        };;
        "ods_sku_attr_value"){
            hive -e "$ods_sku_attr_value"
        };;
        "ods_sku_sale_attr_value"){
            hive -e "$ods_sku_sale_attr_value"
        };;
        "all"){
            hive -e "$ods_order_info$ods_order_detail$ods_sku_info$ods_user_info$ods_payment_info$ods_base_category1$ods_base_category2$ods_base_category3$ods_base_trademark$ods_activity_info$ods_cart_info$ods_comment_info$ods_coupon_info$ods_coupon_use$ods_favor_info$ods_order_refund_info$ods_order_status_log$ods_spu_info$ods_activity_rule$ods_base_dic$ods_order_detail_activity$ods_order_detail_coupon$ods_refund_payment$ods_sku_attr_value$ods_sku_sale_attr_value"
        };;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
  3. 修改权限

  4. 执行脚本

    hdfs_to_ods_db.sh all 2020-06-14
    
    1
  5. 查看数据是否导入成功

# 数仓搭建-DIM层

# 商品维度表(全量)

  1. 建表语句

    DROP TABLE IF EXISTS dim_sku_info;
    CREATE EXTERNAL TABLE dim_sku_info (
        `id` STRING COMMENT '商品id',
        `price` DECIMAL(16,2) COMMENT '商品价格',
        `sku_name` STRING COMMENT '商品名称',
        `sku_desc` STRING COMMENT '商品描述',
        `weight` DECIMAL(16,2) COMMENT '重量',
        `is_sale` BOOLEAN COMMENT '是否在售',
        `spu_id` STRING COMMENT 'spu编号',
        `spu_name` STRING COMMENT 'spu名称',
        `category3_id` STRING COMMENT '三级分类id',
        `category3_name` STRING COMMENT '三级分类名称',
        `category2_id` STRING COMMENT '二级分类id',
        `category2_name` STRING COMMENT '二级分类名称',
        `category1_id` STRING COMMENT '一级分类id',
        `category1_name` STRING COMMENT '一级分类名称',
        `tm_id` STRING COMMENT '品牌id',
        `tm_name` STRING COMMENT '品牌名称',
        `sku_attr_values` ARRAY<STRUCT<attr_id:STRING,value_id:STRING,attr_name:STRING,value_name:STRING>> COMMENT '平台属性',
        `sku_sale_attr_values` ARRAY<STRUCT<sale_attr_id:STRING,sale_attr_value_id:STRING,sale_attr_name:STRING,sale_attr_value_name:STRING>> COMMENT '销售属性',
        `create_time` STRING COMMENT '创建时间'
    ) COMMENT '商品维度表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dim/dim_sku_info/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
  2. 分区规划

  3. 数据装载

  4. Hive读取索引文件问题

    1. 两种方式,分别查询数据有多少行

      select * from ods_log;
      Time taken: 0.706 seconds, Fetched: 2955 row(s)
      
      hive (gmall)> select count(*) from ods_log;
      2959
      
      1
      2
      3
      4
      5
    2. 两次查询结果不一致

      原因是select * from ods_log不执行MR操作,直接采用的是ods_log建表语句中指定的DeprecatedLzoTextInputFormat,能够识别lzo.index为索引文件。

      select count(*) from ods_log执行MR操作,会先经过hive.input.format,其默认值为CombineHiveInputFormat,其会先将索引文件当成小文件合并,将其当做普通文件处理。更严重的是,这会导致LZO文件无法切片。

    3. 解决办法:修改CombineHiveInputFormat为HiveInputFormat

      set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
      
      1
  5. 首日装载

    with
    sku as
    (
        select
            id,
            price,
            sku_name,
            sku_desc,
            weight,
            is_sale,
            spu_id,
            category3_id,
            tm_id,
            create_time
        from ods_sku_info
        where dt='2020-06-14'
    ),
    spu as
    (
        select
            id,
            spu_name
        from ods_spu_info
        where dt='2020-06-14'
    ),
    c3 as
    (
        select
            id,
            name,
            category2_id
        from ods_base_category3
        where dt='2020-06-14'
    ),
    c2 as
    (
        select
            id,
            name,
            category1_id
        from ods_base_category2
        where dt='2020-06-14'
    ),
    c1 as
    (
        select
            id,
            name
        from ods_base_category1
        where dt='2020-06-14'
    ),
    tm as
    (
        select
            id,
            tm_name
        from ods_base_trademark
        where dt='2020-06-14'
    ),
    attr as
    (
        select
            sku_id,
            collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs
        from ods_sku_attr_value
        where dt='2020-06-14'
        group by sku_id
    ),
    sale_attr as
    (
        select
            sku_id,
            collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs
        from ods_sku_sale_attr_value
        where dt='2020-06-14'
        group by sku_id
    )
    insert overwrite table dim_sku_info partition(dt='2020-06-14')
    select
        sku.id,
        sku.price,
        sku.sku_name,
        sku.sku_desc,
        sku.weight,
        sku.is_sale,
        sku.spu_id,
        spu.spu_name,
        sku.category3_id,
        c3.name,
        c3.category2_id,
        c2.name,
        c2.category1_id,
        c1.name,
        sku.tm_id,
        tm.tm_name,
        attr.attrs,
        sale_attr.sale_attrs,
        sku.create_time
    from sku
    left join spu on sku.spu_id=spu.id
    left join c3 on sku.category3_id=c3.id
    left join c2 on c3.category2_id=c2.id
    left join c1 on c2.category1_id=c1.id
    left join tm on sku.tm_id=tm.id
    left join attr on sku.id=attr.sku_id
    left join sale_attr on sku.id=sale_attr.sku_id;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
  6. 每日装载

    with
    sku as
    (
        select
            id,
            price,
            sku_name,
            sku_desc,
            weight,
            is_sale,
            spu_id,
            category3_id,
            tm_id,
            create_time
        from ods_sku_info
        where dt='2020-06-15'
    ),
    spu as
    (
        select
            id,
            spu_name
        from ods_spu_info
        where dt='2020-06-15'
    ),
    c3 as
    (
        select
            id,
            name,
            category2_id
        from ods_base_category3
        where dt='2020-06-15'
    ),
    c2 as
    (
        select
            id,
            name,
            category1_id
        from ods_base_category2
        where dt='2020-06-15'
    ),
    c1 as
    (
        select
            id,
            name
        from ods_base_category1
        where dt='2020-06-15'
    ),
    tm as
    (
        select
            id,
            tm_name
        from ods_base_trademark
        where dt='2020-06-15'
    ),
    attr as
    (
        select
            sku_id,
            collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs
        from ods_sku_attr_value
        where dt='2020-06-15'
        group by sku_id
    ),
    sale_attr as
    (
        select
            sku_id,
            collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs
        from ods_sku_sale_attr_value
        where dt='2020-06-15'
        group by sku_id
    )
    insert overwrite table dim_sku_info partition(dt='2020-06-15')
    select
        sku.id,
        sku.price,
        sku.sku_name,
        sku.sku_desc,
        sku.weight,
        sku.is_sale,
        sku.spu_id,
        spu.spu_name,
        sku.category3_id,
        c3.name,
        c3.category2_id,
        c2.name,
        c2.category1_id,
        c1.name,
        sku.tm_id,
        tm.tm_name,
        attr.attrs,
        sale_attr.sale_attrs,
        sku.create_time
    from sku
    left join spu on sku.spu_id=spu.id
    left join c3 on sku.category3_id=c3.id
    left join c2 on c3.category2_id=c2.id
    left join c1 on c2.category1_id=c1.id
    left join tm on sku.tm_id=tm.id
    left join attr on sku.id=attr.sku_id
    left join sale_attr on sku.id=sale_attr.sku_id;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106

# 优惠券维度表(全量)

  1. 建表语句

    DROP TABLE IF EXISTS dim_coupon_info;
    CREATE EXTERNAL TABLE dim_coupon_info(
        `id` STRING COMMENT '购物券编号',
        `coupon_name` STRING COMMENT '购物券名称',
        `coupon_type` STRING COMMENT '购物券类型 1 现金券 2 折扣券 3 满减券 4 满件打折券',
        `condition_amount` DECIMAL(16,2) COMMENT '满额数',
        `condition_num` BIGINT COMMENT '满件数',
        `activity_id` STRING COMMENT '活动编号',
        `benefit_amount` DECIMAL(16,2) COMMENT '减金额',
        `benefit_discount` DECIMAL(16,2) COMMENT '折扣',
        `create_time` STRING COMMENT '创建时间',
        `range_type` STRING COMMENT '范围类型 1、商品 2、品类 3、品牌',
        `limit_num` BIGINT COMMENT '最多领取次数',
        `taken_count` BIGINT COMMENT '已领取次数',
        `start_time` STRING COMMENT '可以领取的开始日期',
        `end_time` STRING COMMENT '可以领取的结束日期',
        `operate_time` STRING COMMENT '修改时间',
        `expire_time` STRING COMMENT '过期时间'
    ) COMMENT '优惠券维度表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dim/dim_coupon_info/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
  2. 分区规划

  3. 数据装载

  4. 首日加载

    insert overwrite table dim_coupon_info partition(dt='2020-06-14')
    select
        id,
        coupon_name,
        coupon_type,
        condition_amount,
        condition_num,
        activity_id,
        benefit_amount,
        benefit_discount,
        create_time,
        range_type,
        limit_num,
        taken_count,
        start_time,
        end_time,
        operate_time,
        expire_time
    from ods_coupon_info
    where dt='2020-06-14';
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
  5. 每日加载

    insert overwrite table dim_coupon_info partition(dt='2020-06-15')
    select
        id,
        coupon_name,
        coupon_type,
        condition_amount,
        condition_num,
        activity_id,
        benefit_amount,
        benefit_discount,
        create_time,
        range_type,
        limit_num,
        taken_count,
        start_time,
        end_time,
        operate_time,
        expire_time
    from ods_coupon_info
    where dt='2020-06-15';
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20

# 活动维度表(全量)

  1. 建表语句

    DROP TABLE IF EXISTS dim_activity_rule_info;
    CREATE EXTERNAL TABLE dim_activity_rule_info(
        `activity_rule_id` STRING COMMENT '活动规则ID',
        `activity_id` STRING COMMENT '活动ID',
        `activity_name` STRING  COMMENT '活动名称',
        `activity_type` STRING  COMMENT '活动类型',
        `start_time` STRING  COMMENT '开始时间',
        `end_time` STRING  COMMENT '结束时间',
        `create_time` STRING  COMMENT '创建时间',
        `condition_amount` DECIMAL(16,2) COMMENT '满减金额',
        `condition_num` BIGINT COMMENT '满减件数',
        `benefit_amount` DECIMAL(16,2) COMMENT '优惠金额',
        `benefit_discount` DECIMAL(16,2) COMMENT '优惠折扣',
        `benefit_level` STRING COMMENT '优惠级别'
    ) COMMENT '活动信息表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dim/dim_activity_rule_info/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
  2. 分区规划

  3. 数据装载

  4. 首日加载

    insert overwrite table dim_activity_rule_info partition(dt='2020-06-14')
    select
        ar.id,
        ar.activity_id,
        ai.activity_name,
        ar.activity_type,
        ai.start_time,
        ai.end_time,
        ai.create_time,
        ar.condition_amount,
        ar.condition_num,
        ar.benefit_amount,
        ar.benefit_discount,
        ar.benefit_level
    from
    (
        select
            id,
            activity_id,
            activity_type,
            condition_amount,
            condition_num,
            benefit_amount,
            benefit_discount,
            benefit_level
        from ods_activity_rule
        where dt='2020-06-14'
    )ar
    left join
    (
        select
            id,
            activity_name,
            start_time,
            end_time,
            create_time
        from ods_activity_info
        where dt='2020-06-14'
    )ai
    on ar.activity_id=ai.id;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
  5. 每日加载

    insert overwrite table dim_activity_rule_info partition(dt='2020-06-15')
    select
        ar.id,
        ar.activity_id,
        ai.activity_name,
        ar.activity_type,
        ai.start_time,
        ai.end_time,
        ai.create_time,
        ar.condition_amount,
        ar.condition_num,
        ar.benefit_amount,
        ar.benefit_discount,
        ar.benefit_level
    from
    (
        select
            id,
            activity_id,
            activity_type,
            condition_amount,
            condition_num,
            benefit_amount,
            benefit_discount,
            benefit_level
        from ods_activity_rule
        where dt='2020-06-15'
    )ar
    left join
    (
        select
            id,
            activity_name,
            start_time,
            end_time,
            create_time
        from ods_activity_info
        where dt='2020-06-15'
    )ai
    on ar.activity_id=ai.id;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40

# 地区维度表(特殊)

  1. 建表语句

    DROP TABLE IF EXISTS dim_base_province;
    CREATE EXTERNAL TABLE dim_base_province (
        `id` STRING COMMENT 'id',
        `province_name` STRING COMMENT '省市名称',
        `area_code` STRING COMMENT '地区编码',
        `iso_code` STRING COMMENT 'ISO-3166编码,供可视化使用',
        `iso_3166_2` STRING COMMENT 'IOS-3166-2编码,供可视化使用',
        `region_id` STRING COMMENT '地区id',
        `region_name` STRING COMMENT '地区名称'
    ) COMMENT '地区维度表'
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dim/dim_base_province/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
  2. 数据装载

    地区维度表数据相对稳定,变化概率较低,故无需每日装载

    insert overwrite table dim_base_province
    select
        bp.id,
        bp.name,
        bp.area_code,
        bp.iso_code,
        bp.iso_3166_2,
        bp.region_id,
        br.region_name
    from ods_base_province bp
    join ods_base_region br on bp.region_id = br.id;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11

# 时间维度表(特殊)

  1. 建表语句

    DROP TABLE IF EXISTS dim_date_info;
    CREATE EXTERNAL TABLE dim_date_info(
        `date_id` STRING COMMENT '日',
        `week_id` STRING COMMENT '周ID',
        `week_day` STRING COMMENT '周几',
        `day` STRING COMMENT '每月的第几天',
        `month` STRING COMMENT '第几月',
        `quarter` STRING COMMENT '第几季度',
        `year` STRING COMMENT '年',
        `is_workday` STRING COMMENT '是否是工作日',
        `holiday_id` STRING COMMENT '节假日'
    ) COMMENT '时间维度表'
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dim/dim_date_info/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
  2. 数据装载

    通常情况下,时间维度表的数据并不是来自于业务系统,而是手动写入,并且由于时间维度表数据的可预见性,无须每日导入,一般可一次性导入一年的数据。

    1. 创建临时表

      DROP TABLE IF EXISTS tmp_dim_date_info;
      CREATE EXTERNAL TABLE tmp_dim_date_info (
          `date_id` STRING COMMENT '日',
          `week_id` STRING COMMENT '周ID',
          `week_day` STRING COMMENT '周几',
          `day` STRING COMMENT '每月的第几天',
          `month` STRING COMMENT '第几月',
          `quarter` STRING COMMENT '第几季度',
          `year` STRING COMMENT '年',
          `is_workday` STRING COMMENT '是否是工作日',
          `holiday_id` STRING COMMENT '节假日'
      ) COMMENT '时间维度表'
      ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
      LOCATION '/warehouse/gmall/tmp/tmp_dim_date_info/';
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
    2. 将数据文件上传到HFDS上临时表指定路径/warehouse/gmall/tmp/tmp_dim_date_info/

    3. 执行以下语句将其导入时间维度表

      insert overwrite table dim_date_info select * from tmp_dim_date_info;
      
      1
    4. 检查数据是否导入成功

# 用户维度表(拉链表)

# 拉链表概述

为什么要做拉链表

如何使用拉链表

拉链表形成过程

# 制作拉链表

  1. 建表语句

    DROP TABLE IF EXISTS dim_user_info;
    CREATE EXTERNAL TABLE dim_user_info(
        `id` STRING COMMENT '用户id',
        `login_name` STRING COMMENT '用户名称',
        `nick_name` STRING COMMENT '用户昵称',
        `name` STRING COMMENT '用户姓名',
        `phone_num` STRING COMMENT '手机号码',
        `email` STRING COMMENT '邮箱',
        `user_level` STRING COMMENT '用户等级',
        `birthday` STRING COMMENT '生日',
        `gender` STRING COMMENT '性别',
        `create_time` STRING COMMENT '创建时间',
        `operate_time` STRING COMMENT '操作时间',
        `start_date` STRING COMMENT '开始日期',
        `end_date` STRING COMMENT '结束日期'
    ) COMMENT '用户表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dim/dim_user_info/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
  2. 分区规划

  3. 数据中装载

  4. 首日装载

    拉链表首日装载,需要进行初始化操作,具体工作为将截止到初始化当日的全部历史用户导入一次性导入到拉链表中。目前的ods_user_info表的第一个分区,即2020-06-14分区中就是全部的历史用户,故将该分区数据进行一定处理后导入拉链表的9999-99-99分区即可。

    insert overwrite table dim_user_info partition(dt='9999-99-99')
    select
        id,
        login_name,
        nick_name,
        md5(name),
        md5(phone_num),
        md5(email),
        user_level,
        birthday,
        gender,
        create_time,
        operate_time,
        '2020-06-14',
        '9999-99-99'
    from ods_user_info
    where dt='2020-06-14';
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
  5. 每日装载

    1. sql编写

      with
      tmp as
      (
          select
              old.id old_id,
              old.login_name old_login_name,
              old.nick_name old_nick_name,
              old.name old_name,
              old.phone_num old_phone_num,
              old.email old_email,
              old.user_level old_user_level,
              old.birthday old_birthday,
              old.gender old_gender,
              old.create_time old_create_time,
              old.operate_time old_operate_time,
              old.start_date old_start_date,
              old.end_date old_end_date,
              new.id new_id,
              new.login_name new_login_name,
              new.nick_name new_nick_name,
              new.name new_name,
              new.phone_num new_phone_num,
              new.email new_email,
              new.user_level new_user_level,
              new.birthday new_birthday,
              new.gender new_gender,
              new.create_time new_create_time,
              new.operate_time new_operate_time,
              new.start_date new_start_date,
              new.end_date new_end_date
          from
          (
              select
                  id,
                  login_name,
                  nick_name,
                  name,
                  phone_num,
                  email,
                  user_level,
                  birthday,
                  gender,
                  create_time,
                  operate_time,
                  start_date,
                  end_date
              from dim_user_info
              where dt='9999-99-99'
          )old
          full outer join
          (
              select
                  id,
                  login_name,
                  nick_name,
                  md5(name) name,
                  md5(phone_num) phone_num,
                  md5(email) email,
                  user_level,
                  birthday,
                  gender,
                  create_time,
                  operate_time,
                  '2020-06-15' start_date,
                  '9999-99-99' end_date
              from ods_user_info
              where dt='2020-06-15'
          )new
          on old.id=new.id
      )
      insert overwrite table dim_user_info partition(dt)
      select
          nvl(new_id,old_id),
          nvl(new_login_name,old_login_name),
          nvl(new_nick_name,old_nick_name),
          nvl(new_name,old_name),
          nvl(new_phone_num,old_phone_num),
          nvl(new_email,old_email),
          nvl(new_user_level,old_user_level),
          nvl(new_birthday,old_birthday),
          nvl(new_gender,old_gender),
          nvl(new_create_time,old_create_time),
          nvl(new_operate_time,old_operate_time),
          nvl(new_start_date,old_start_date),
          nvl(new_end_date,old_end_date),
          nvl(new_end_date,old_end_date) dt
      from tmp
      union all
      select
          old_id,
          old_login_name,
          old_nick_name,
          old_name,
          old_phone_num,
          old_email,
          old_user_level,
          old_birthday,
          old_gender,
          old_create_time,
          old_operate_time,
          old_start_date,
          cast(date_add('2020-06-15',-1) as string),
          cast(date_add('2020-06-15',-1) as string) dt
      from tmp
      where new_id is not null and old_id is not null;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105

# DIM层首日数据装载脚本

  1. 编写脚本

  2. 在/home/damoncai/bin目录下创建脚本ods_to_dim_db_init.sh

    vim ods_to_dim_db_init.sh
    
    1
    #!/bin/bash
    
    APP=gmall
    
    if [ -n "$2" ] ;then
       do_date=$2
    else 
       echo "请传入日期参数"
       exit
    fi 
    
    dim_user_info="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dim_user_info partition(dt='9999-99-99')
    select
        id,
        login_name,
        nick_name,
        md5(name),
        md5(phone_num),
        md5(email),
        user_level,
        birthday,
        gender,
        create_time,
        operate_time,
        '$do_date',
        '9999-99-99'
    from ${APP}.ods_user_info
    where dt='$do_date';
    "
    
    dim_sku_info="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    with
    sku as
    (
        select
            id,
            price,
            sku_name,
            sku_desc,
            weight,
            is_sale,
            spu_id,
            category3_id,
            tm_id,
            create_time
        from ${APP}.ods_sku_info
        where dt='$do_date'
    ),
    spu as
    (
        select
            id,
            spu_name
        from ${APP}.ods_spu_info
        where dt='$do_date'
    ),
    c3 as
    (
        select
            id,
            name,
            category2_id
        from ${APP}.ods_base_category3
        where dt='$do_date'
    ),
    c2 as
    (
        select
            id,
            name,
            category1_id
        from ${APP}.ods_base_category2
        where dt='$do_date'
    ),
    c1 as
    (
        select
            id,
            name
        from ${APP}.ods_base_category1
        where dt='$do_date'
    ),
    tm as
    (
        select
            id,
            tm_name
        from ${APP}.ods_base_trademark
        where dt='$do_date'
    ),
    attr as
    (
        select
            sku_id,
            collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs
        from ${APP}.ods_sku_attr_value
        where dt='$do_date'
        group by sku_id
    ),
    sale_attr as
    (
        select
            sku_id,
            collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs
        from ${APP}.ods_sku_sale_attr_value
        where dt='$do_date'
        group by sku_id
    )
    
    insert overwrite table ${APP}.dim_sku_info partition(dt='$do_date')
    select
        sku.id,
        sku.price,
        sku.sku_name,
        sku.sku_desc,
        sku.weight,
        sku.is_sale,
        sku.spu_id,
        spu.spu_name,
        sku.category3_id,
        c3.name,
        c3.category2_id,
        c2.name,
        c2.category1_id,
        c1.name,
        sku.tm_id,
        tm.tm_name,
        attr.attrs,
        sale_attr.sale_attrs,
        sku.create_time
    from sku
    left join spu on sku.spu_id=spu.id
    left join c3 on sku.category3_id=c3.id
    left join c2 on c3.category2_id=c2.id
    left join c1 on c2.category1_id=c1.id
    left join tm on sku.tm_id=tm.id
    left join attr on sku.id=attr.sku_id
    left join sale_attr on sku.id=sale_attr.sku_id;
    "
    
    dim_base_province="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dim_base_province
    select
        bp.id,
        bp.name,
        bp.area_code,
        bp.iso_code,
        bp.iso_3166_2,
        bp.region_id,
        br.region_name
    from ${APP}.ods_base_province bp
    join ${APP}.ods_base_region br on bp.region_id = br.id;
    "
    
    dim_coupon_info="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dim_coupon_info partition(dt='$do_date')
    select
        id,
        coupon_name,
        coupon_type,
        condition_amount,
        condition_num,
        activity_id,
        benefit_amount,
        benefit_discount,
        create_time,
        range_type,
        limit_num,
        taken_count,
        start_time,
        end_time,
        operate_time,
        expire_time
    from ${APP}.ods_coupon_info
    where dt='$do_date';
    "
    
    dim_activity_rule_info="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dim_activity_rule_info partition(dt='$do_date')
    select
        ar.id,
        ar.activity_id,
        ai.activity_name,
        ar.activity_type,
        ai.start_time,
        ai.end_time,
        ai.create_time,
        ar.condition_amount,
        ar.condition_num,
        ar.benefit_amount,
        ar.benefit_discount,
        ar.benefit_level
    from
    (
        select
            id,
            activity_id,
            activity_type,
            condition_amount,
            condition_num,
            benefit_amount,
            benefit_discount,
            benefit_level
        from ${APP}.ods_activity_rule
        where dt='$do_date'
    )ar
    left join
    (
        select
            id,
            activity_name,
            start_time,
            end_time,
            create_time
        from ${APP}.ods_activity_info
        where dt='$do_date'
    )ai
    on ar.activity_id=ai.id;
    "
    
    case $1 in
    "dim_user_info"){
        hive -e "$dim_user_info"
    };;
    "dim_sku_info"){
        hive -e "$dim_sku_info"
    };;
    "dim_base_province"){
        hive -e "$dim_base_province"
    };;
    "dim_coupon_info"){
        hive -e "$dim_coupon_info"
    };;
    "dim_activity_rule_info"){
        hive -e "$dim_activity_rule_info"
    };;
    "all"){
        hive -e "$dim_user_info$dim_sku_info$dim_coupon_info$dim_activity_rule_info$dim_base_province"
    };;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
  3. 添加执行权限

  4. 使用脚本

    ods_to_dim_db_init.sh all 2020-06-14
    
    1

# DIM层每日数据装载脚本

  1. 脚本编写

  2. 在/home/damoncai/bin目录下创建脚本ods_to_dim_db.sh

    vim ods_to_dim_db.sh
    
    1
    #!/bin/bash
    
    APP=gmall
    
    # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
    if [ -n "$2" ] ;then
        do_date=$2
    else 
        do_date=`date -d "-1 day" +%F`
    fi
    
    dim_user_info="
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    with
    tmp as
    (
        select
            old.id old_id,
            old.login_name old_login_name,
            old.nick_name old_nick_name,
            old.name old_name,
            old.phone_num old_phone_num,
            old.email old_email,
            old.user_level old_user_level,
            old.birthday old_birthday,
            old.gender old_gender,
            old.create_time old_create_time,
            old.operate_time old_operate_time,
            old.start_date old_start_date,
            old.end_date old_end_date,
            new.id new_id,
            new.login_name new_login_name,
            new.nick_name new_nick_name,
            new.name new_name,
            new.phone_num new_phone_num,
            new.email new_email,
            new.user_level new_user_level,
            new.birthday new_birthday,
            new.gender new_gender,
            new.create_time new_create_time,
            new.operate_time new_operate_time,
            new.start_date new_start_date,
            new.end_date new_end_date
        from
        (
            select
                id,
                login_name,
                nick_name,
                name,
                phone_num,
                email,
                user_level,
                birthday,
                gender,
                create_time,
                operate_time,
                start_date,
                end_date
            from ${APP}.dim_user_info
            where dt='9999-99-99'
            and start_date<'$do_date'
        )old
        full outer join
        (
            select
                id,
                login_name,
                nick_name,
                md5(name) name,
                md5(phone_num) phone_num,
                md5(email) email,
                user_level,
                birthday,
                gender,
                create_time,
                operate_time,
                '$do_date' start_date,
                '9999-99-99' end_date
            from ${APP}.ods_user_info
            where dt='$do_date'
        )new
        on old.id=new.id
    )
    insert overwrite table ${APP}.dim_user_info partition(dt)
    select
        nvl(new_id,old_id),
        nvl(new_login_name,old_login_name),
        nvl(new_nick_name,old_nick_name),
        nvl(new_name,old_name),
        nvl(new_phone_num,old_phone_num),
        nvl(new_email,old_email),
        nvl(new_user_level,old_user_level),
        nvl(new_birthday,old_birthday),
        nvl(new_gender,old_gender),
        nvl(new_create_time,old_create_time),
        nvl(new_operate_time,old_operate_time),
        nvl(new_start_date,old_start_date),
        nvl(new_end_date,old_end_date),
        nvl(new_end_date,old_end_date) dt
    from tmp
    union all
    select
        old_id,
        old_login_name,
        old_nick_name,
        old_name,
        old_phone_num,
        old_email,
        old_user_level,
        old_birthday,
        old_gender,
        old_create_time,
        old_operate_time,
        old_start_date,
        cast(date_add('$do_date',-1) as string),
        cast(date_add('$do_date',-1) as string) dt
    from tmp
    where new_id is not null and old_id is not null;
    "
    
    dim_sku_info="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    with
    sku as
    (
        select
            id,
            price,
            sku_name,
            sku_desc,
            weight,
            is_sale,
            spu_id,
            category3_id,
            tm_id,
            create_time
        from ${APP}.ods_sku_info
        where dt='$do_date'
    ),
    spu as
    (
        select
            id,
            spu_name
        from ${APP}.ods_spu_info
        where dt='$do_date'
    ),
    c3 as
    (
        select
            id,
            name,
            category2_id
        from ${APP}.ods_base_category3
        where dt='$do_date'
    ),
    c2 as
    (
        select
            id,
            name,
            category1_id
        from ${APP}.ods_base_category2
        where dt='$do_date'
    ),
    c1 as
    (
        select
            id,
            name
        from ${APP}.ods_base_category1
        where dt='$do_date'
    ),
    tm as
    (
        select
            id,
            tm_name
        from ${APP}.ods_base_trademark
        where dt='$do_date'
    ),
    attr as
    (
        select
            sku_id,
            collect_set(named_struct('attr_id',attr_id,'value_id',value_id,'attr_name',attr_name,'value_name',value_name)) attrs
        from ${APP}.ods_sku_attr_value
        where dt='$do_date'
        group by sku_id
    ),
    sale_attr as
    (
        select
            sku_id,
            collect_set(named_struct('sale_attr_id',sale_attr_id,'sale_attr_value_id',sale_attr_value_id,'sale_attr_name',sale_attr_name,'sale_attr_value_name',sale_attr_value_name)) sale_attrs
        from ${APP}.ods_sku_sale_attr_value
        where dt='$do_date'
        group by sku_id
    )
    
    insert overwrite table ${APP}.dim_sku_info partition(dt='$do_date')
    select
        sku.id,
        sku.price,
        sku.sku_name,
        sku.sku_desc,
        sku.weight,
        sku.is_sale,
        sku.spu_id,
        spu.spu_name,
        sku.category3_id,
        c3.name,
        c3.category2_id,
        c2.name,
        c2.category1_id,
        c1.name,
        sku.tm_id,
        tm.tm_name,
        attr.attrs,
        sale_attr.sale_attrs,
        sku.create_time
    from sku
    left join spu on sku.spu_id=spu.id
    left join c3 on sku.category3_id=c3.id
    left join c2 on c3.category2_id=c2.id
    left join c1 on c2.category1_id=c1.id
    left join tm on sku.tm_id=tm.id
    left join attr on sku.id=attr.sku_id
    left join sale_attr on sku.id=sale_attr.sku_id;
    "
    
    dim_base_province="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dim_base_province
    select
        bp.id,
        bp.name,
        bp.area_code,
        bp.iso_code,
        bp.iso_3166_2,
        bp.region_id,
        bp.name
    from ${APP}.ods_base_province bp
    join ${APP}.ods_base_region br on bp.region_id = br.id;
    "
    
    dim_coupon_info="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dim_coupon_info partition(dt='$do_date')
    select
        id,
        coupon_name,
        coupon_type,
        condition_amount,
        condition_num,
        activity_id,
        benefit_amount,
        benefit_discount,
        create_time,
        range_type,
        limit_num,
        taken_count,
        start_time,
        end_time,
        operate_time,
        expire_time
    from ${APP}.ods_coupon_info
    where dt='$do_date';
    "
    
    dim_activity_rule_info="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dim_activity_rule_info partition(dt='$do_date')
    select
        ar.id,
        ar.activity_id,
        ai.activity_name,
        ar.activity_type,
        ai.start_time,
        ai.end_time,
        ai.create_time,
        ar.condition_amount,
        ar.condition_num,
        ar.benefit_amount,
        ar.benefit_discount,
        ar.benefit_level
    from
    (
        select
            id,
            activity_id,
            activity_type,
            condition_amount,
            condition_num,
            benefit_amount,
            benefit_discount,
            benefit_level
        from ${APP}.ods_activity_rule
        where dt='$do_date'
    )ar
    left join
    (
        select
            id,
            activity_name,
            start_time,
            end_time,
            create_time
        from ${APP}.ods_activity_info
        where dt='$do_date'
    )ai
    on ar.activity_id=ai.id;
    "
    
    case $1 in
    "dim_user_info"){
        hive -e "$dim_user_info"
    };;
    "dim_sku_info"){
        hive -e "$dim_sku_info"
    };;
    "dim_base_province"){
        hive -e "$dim_base_province"
    };;
    "dim_coupon_info"){
        hive -e "$dim_coupon_info"
    };;
    "dim_activity_rule_info"){
        hive -e "$dim_activity_rule_info"
    };;
    "all"){
        hive -e "$dim_user_info$dim_sku_info$dim_coupon_info$dim_activity_rule_info"
    };;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
    318
    319
    320
    321
    322
    323
    324
    325
    326
    327
    328
    329
    330
    331
    332
    333
    334
    335
    336
  3. 添加执行权限

  4. 执行脚本

    ods_to_dim_db.sh all 2020-06-14
    
    1

# 数仓搭建-DWD层

# DWD层(用户行为日志)

# 日志解析思路

  1. 日志结构回顾

    1. 页面埋点日志

    1. 启动日志

  2. 日志解析思路

# get_json_object函数使用

  1. 数据

    [{"name":"大郎","sex":"男","age":"25"},{"name":"西门庆","sex":"男","age":"47"}]
    
    1
  2. 取出第一个json对象

    hive (gmall)>
    select get_json_object('[{"name":"大郎","sex":"男","age":"25"},{"name":"西门庆","sex":"男","age":"47"}]','$[0]');
    
    1
    2

    结果是:{"name":"大郎","sex":"男","age":"25"}

  3. 取出第一个json的age字段的值

    hive (gmall)>
    SELECT get_json_object('[{"name":"大郎","sex":"男","age":"25"},{"name":"西门庆","sex":"男","age":"47"}]',"$[0].age");
    
    1
    2

    结果是:25

# 启动日志表

**启动日志解析思路:**启动日志表中每行数据对应一个启动记录,一个启动记录应该包含日志中的公共信息和启动信息。先将所有包含start字段的日志过滤出来,然后使用get_json_object函数解析每个字段。

  1. 建表语句

    DROP TABLE IF EXISTS dwd_start_log;
    CREATE EXTERNAL TABLE dwd_start_log(
        `area_code` STRING COMMENT '地区编码',
        `brand` STRING COMMENT '手机品牌',
        `channel` STRING COMMENT '渠道',
        `is_new` STRING COMMENT '是否首次启动',
        `model` STRING COMMENT '手机型号',
        `mid_id` STRING COMMENT '设备id',
        `os` STRING COMMENT '操作系统',
        `user_id` STRING COMMENT '会员id',
        `version_code` STRING COMMENT 'app版本号',
        `entry` STRING COMMENT 'icon手机图标 notice 通知 install 安装后启动',
        `loading_time` BIGINT COMMENT '启动加载时间',
        `open_ad_id` STRING COMMENT '广告页ID ',
        `open_ad_ms` BIGINT COMMENT '广告总共播放时间',
        `open_ad_skip_ms` BIGINT COMMENT '用户跳过广告时点',
        `ts` BIGINT COMMENT '时间'
    ) COMMENT '启动日志表'
    PARTITIONED BY (`dt` STRING) -- 按照时间创建分区
    STORED AS PARQUET -- 采用parquet列式存储
    LOCATION '/warehouse/gmall/dwd/dwd_start_log' -- 指定在HDFS上存储位置
    TBLPROPERTIES('parquet.compression'='lzo') -- 采用LZO压缩
    ;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
  2. 数据导入

    insert overwrite table dwd_start_log partition(dt='2020-06-14')
    select
        get_json_object(line,'$.common.ar'),
        get_json_object(line,'$.common.ba'),
        get_json_object(line,'$.common.ch'),
        get_json_object(line,'$.common.is_new'),
        get_json_object(line,'$.common.md'),
        get_json_object(line,'$.common.mid'),
        get_json_object(line,'$.common.os'),
        get_json_object(line,'$.common.uid'),
        get_json_object(line,'$.common.vc'),
        get_json_object(line,'$.start.entry'),
        get_json_object(line,'$.start.loading_time'),
        get_json_object(line,'$.start.open_ad_id'),
        get_json_object(line,'$.start.open_ad_ms'),
        get_json_object(line,'$.start.open_ad_skip_ms'),
        get_json_object(line,'$.ts')
    from ods_log
    where dt='2020-06-14'
    and get_json_object(line,'$.start') is not null;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
  3. 查看数据

    select * from dwd_start_log where dt='2020-06-14' limit 2;
    
    1

# 页面日志表

**页面日志解析思路:**页面日志表中每行数据对应一个页面访问记录,一个页面访问记录应该包含日志中的公共信息和页面信息。先将所有包含page字段的日志过滤出来,然后使用get_json_object函数解析每个字段。

  1. 建表语句

    DROP TABLE IF EXISTS dwd_page_log;
    CREATE EXTERNAL TABLE dwd_page_log(
        `area_code` STRING COMMENT '地区编码',
        `brand` STRING COMMENT '手机品牌',
        `channel` STRING COMMENT '渠道',
        `is_new` STRING COMMENT '是否首次启动',
        `model` STRING COMMENT '手机型号',
        `mid_id` STRING COMMENT '设备id',
        `os` STRING COMMENT '操作系统',
        `user_id` STRING COMMENT '会员id',
        `version_code` STRING COMMENT 'app版本号',
        `during_time` BIGINT COMMENT '持续时间毫秒',
        `page_item` STRING COMMENT '目标id ',
        `page_item_type` STRING COMMENT '目标类型',
        `last_page_id` STRING COMMENT '上页类型',
        `page_id` STRING COMMENT '页面ID ',
        `source_type` STRING COMMENT '来源类型',
        `ts` bigint
    ) COMMENT '页面日志表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwd/dwd_page_log'
    TBLPROPERTIES('parquet.compression'='lzo');
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
  2. 导入数据

    insert overwrite table dwd_page_log partition(dt='2020-06-14')
    select
        get_json_object(line,'$.common.ar'),
        get_json_object(line,'$.common.ba'),
        get_json_object(line,'$.common.ch'),
        get_json_object(line,'$.common.is_new'),
        get_json_object(line,'$.common.md'),
        get_json_object(line,'$.common.mid'),
        get_json_object(line,'$.common.os'),
        get_json_object(line,'$.common.uid'),
        get_json_object(line,'$.common.vc'),
        get_json_object(line,'$.page.during_time'),
        get_json_object(line,'$.page.item'),
        get_json_object(line,'$.page.item_type'),
        get_json_object(line,'$.page.last_page_id'),
        get_json_object(line,'$.page.page_id'),
        get_json_object(line,'$.page.source_type'),
        get_json_object(line,'$.ts')
    from ods_log
    where dt='2020-06-14'
    and get_json_object(line,'$.page') is not null;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
  3. 查看数据

    select * from dwd_page_log where dt='2020-06-14' limit 2;
    
    1

# 动作日志表

**动作日志解析思路:**动作日志表中每行数据对应用户的一个动作记录,一个动作记录应当包含公共信息、页面信息以及动作信息。先将包含action字段的日志过滤出来,然后通过UDTF函数,将action数组“炸开”(类似于explode函数的效果),然后使用get_json_object函数解析每个字段。

  1. 建表语句

    DROP TABLE IF EXISTS dwd_action_log;
    CREATE EXTERNAL TABLE dwd_action_log(
        `area_code` STRING COMMENT '地区编码',
        `brand` STRING COMMENT '手机品牌',
        `channel` STRING COMMENT '渠道',
        `is_new` STRING COMMENT '是否首次启动',
        `model` STRING COMMENT '手机型号',
        `mid_id` STRING COMMENT '设备id',
        `os` STRING COMMENT '操作系统',
        `user_id` STRING COMMENT '会员id',
        `version_code` STRING COMMENT 'app版本号',
        `during_time` BIGINT COMMENT '持续时间毫秒',
        `page_item` STRING COMMENT '目标id ',
        `page_item_type` STRING COMMENT '目标类型',
        `last_page_id` STRING COMMENT '上页类型',
        `page_id` STRING COMMENT '页面id ',
        `source_type` STRING COMMENT '来源类型',
        `action_id` STRING COMMENT '动作id',
        `item` STRING COMMENT '目标id ',
        `item_type` STRING COMMENT '目标类型',
        `ts` BIGINT COMMENT '时间'
    ) COMMENT '动作日志表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwd/dwd_action_log'
    TBLPROPERTIES('parquet.compression'='lzo');
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
  2. 创建UDTF函数——设计思路

  3. 创建UDTF函数——编写代码

    1. 引入如下依赖

      <dependencies>
          <!--添加hive依赖-->
          <dependency>
              <groupId>org.apache.hive</groupId>
              <artifactId>hive-exec</artifactId>
              <version>3.1.2</version>
          </dependency>
      </dependencies>
      
      1
      2
      3
      4
      5
      6
      7
      8
    2. 编码

      public class ExplodeJSONArray extends GenericUDTF {
      
          @Override
          public StructObjectInspector initialize(ObjectInspector[] argOIs) throws UDFArgumentException {
      
              // 1 参数合法性检查
              if (argOIs.length != 1) {
                  throw new UDFArgumentException("explode_json_array 只需要一个参数");
              }
      
              // 2 第一个参数必须为string
              //判断参数是否为基础数据类型
              if (argOIs[0].getCategory() != ObjectInspector.Category.PRIMITIVE) {
                  throw new UDFArgumentException("explode_json_array 只接受基础类型参数");
              }
      
              //将参数对象检查器强转为基础类型对象检查器
              PrimitiveObjectInspector argumentOI = (PrimitiveObjectInspector) argOIs[0];
      
              //判断参数是否为String类型
              if (argumentOI.getPrimitiveCategory() != PrimitiveObjectInspector.PrimitiveCategory.STRING) {
                  throw new UDFArgumentException("explode_json_array 只接受string类型的参数");
              }
      
              // 3 定义返回值名称和类型
              List<String> fieldNames = new ArrayList<String>();
              List<ObjectInspector> fieldOIs = new ArrayList<ObjectInspector>();
      
              fieldNames.add("items");
              fieldOIs.add(PrimitiveObjectInspectorFactory.javaStringObjectInspector);
      
              return ObjectInspectorFactory.getStandardStructObjectInspector(fieldNames, fieldOIs);
          }
      
          public void process(Object[] objects) throws HiveException {
      
              // 1 获取传入的数据
              String jsonArray = objects[0].toString();
      
              // 2 将string转换为json数组
              JSONArray actions = new JSONArray(jsonArray);
      
              // 3 循环一次,取出数组中的一个json,并写出
              for (int i = 0; i < actions.length(); i++) {
      
                  String[] result = new String[1];
                  result[0] = actions.getString(i);
                  forward(result);
              }
          }
      
          public void close() throws HiveException {
      
          }
      
      }
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
  4. 创建函数

    1. 打包

    2. 将hivefunction-1.0-SNAPSHOT.jar上传到hadoop102的/opt/module,然后再将该jar包上传到HDFS的/user/hive/jars路径下

      hadoop fs -mkdir -p /user/hive/jars
      hadoop fs -put hivefunction-1.0-SNAPSHOT.jar /user/hive/jars
      
      1
      2
    3. 创建永久函数与开发好的java class关联

      create function explode_json_array as 'top.damoncai.udtf.ExplodeJSONArray' using jar 'hdfs://ha01:8020/user/hive/jars/02_hive_udtf-1.0-SNAPSHOT.jar';
      
      1

      如果修改了自定义函数重新生成jar包怎么处理?只需要替换HDFS路径上的旧jar包,然后重启Hive客户端即可。

  5. 数据导入

    insert overwrite table dwd_action_log partition(dt='2020-06-14')
    select
        get_json_object(line,'$.common.ar'),
        get_json_object(line,'$.common.ba'),
        get_json_object(line,'$.common.ch'),
        get_json_object(line,'$.common.is_new'),
        get_json_object(line,'$.common.md'),
        get_json_object(line,'$.common.mid'),
        get_json_object(line,'$.common.os'),
        get_json_object(line,'$.common.uid'),
        get_json_object(line,'$.common.vc'),
        get_json_object(line,'$.page.during_time'),
        get_json_object(line,'$.page.item'),
        get_json_object(line,'$.page.item_type'),
        get_json_object(line,'$.page.last_page_id'),
        get_json_object(line,'$.page.page_id'),
        get_json_object(line,'$.page.source_type'),
        get_json_object(action,'$.action_id'),
        get_json_object(action,'$.item'),
        get_json_object(action,'$.item_type'),
        get_json_object(action,'$.ts')
    from ods_log lateral view explode_json_array(get_json_object(line,'$.actions')) tmp as action
    where dt='2020-06-14'
    and get_json_object(line,'$.actions') is not null;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
  6. 查看数据

    select * from dwd_action_log where dt='2020-06-14' limit 2;
    
    1

# 曝光日志表

**曝光日志解析思路:**曝光日志表中每行数据对应一个曝光记录,一个曝光记录应当包含公共信息、页面信息以及曝光信息。先将包含display字段的日志过滤出来,然后通过UDTF函数,将display数组“炸开”(类似于explode函数的效果),然后使用get_json_object函数解析每个字段。

  1. 建表语句

    DROP TABLE IF EXISTS dwd_display_log;
    CREATE EXTERNAL TABLE dwd_display_log(
        `area_code` STRING COMMENT '地区编码',
        `brand` STRING COMMENT '手机品牌',
        `channel` STRING COMMENT '渠道',
        `is_new` STRING COMMENT '是否首次启动',
        `model` STRING COMMENT '手机型号',
        `mid_id` STRING COMMENT '设备id',
        `os` STRING COMMENT '操作系统',
        `user_id` STRING COMMENT '会员id',
        `version_code` STRING COMMENT 'app版本号',
        `during_time` BIGINT COMMENT 'app版本号',
        `page_item` STRING COMMENT '目标id ',
        `page_item_type` STRING COMMENT '目标类型',
        `last_page_id` STRING COMMENT '上页类型',
        `page_id` STRING COMMENT '页面ID ',
        `source_type` STRING COMMENT '来源类型',
        `ts` BIGINT COMMENT 'app版本号',
        `display_type` STRING COMMENT '曝光类型',
        `item` STRING COMMENT '曝光对象id ',
        `item_type` STRING COMMENT 'app版本号',
        `order` BIGINT COMMENT '曝光顺序',
        `pos_id` BIGINT COMMENT '曝光位置'
    ) COMMENT '曝光日志表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwd/dwd_display_log'
    TBLPROPERTIES('parquet.compression'='lzo'); 
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
  2. 数据导入

    insert overwrite table dwd_display_log partition(dt='2020-06-14')
    select
        get_json_object(line,'$.common.ar'),
        get_json_object(line,'$.common.ba'),
        get_json_object(line,'$.common.ch'),
        get_json_object(line,'$.common.is_new'),
        get_json_object(line,'$.common.md'),
        get_json_object(line,'$.common.mid'),
        get_json_object(line,'$.common.os'),
        get_json_object(line,'$.common.uid'),
        get_json_object(line,'$.common.vc'),
        get_json_object(line,'$.page.during_time'),
        get_json_object(line,'$.page.item'),
        get_json_object(line,'$.page.item_type'),
        get_json_object(line,'$.page.last_page_id'),
        get_json_object(line,'$.page.page_id'),
        get_json_object(line,'$.page.source_type'),
        get_json_object(line,'$.ts'),
        get_json_object(display,'$.display_type'),
        get_json_object(display,'$.item'),
        get_json_object(display,'$.item_type'),
        get_json_object(display,'$.order'),
        get_json_object(display,'$.pos_id')
    from ods_log lateral view explode_json_array(get_json_object(line,'$.displays')) tmp as display
    where dt='2020-06-14'
    and get_json_object(line,'$.displays') is not null;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
  3. 查看数据

    select * from dwd_display_log where dt='2020-06-14' limit 2;
    
    1

# 错误日志

**错误日志解析思路:**错误日志表中每行数据对应一个错误记录,为方便定位错误,一个错误记录应当包含与之对应的公共信息、页面信息、曝光信息、动作信息、启动信息以及错误信息。先将包含err字段的日志过滤出来,然后使用get_json_object函数解析所有字段。

  1. 建表语句

    DROP TABLE IF EXISTS dwd_error_log;
    CREATE EXTERNAL TABLE dwd_error_log(
        `area_code` STRING COMMENT '地区编码',
        `brand` STRING COMMENT '手机品牌',
        `channel` STRING COMMENT '渠道',
        `is_new` STRING COMMENT '是否首次启动',
        `model` STRING COMMENT '手机型号',
        `mid_id` STRING COMMENT '设备id',
        `os` STRING COMMENT '操作系统',
        `user_id` STRING COMMENT '会员id',
        `version_code` STRING COMMENT 'app版本号',
        `page_item` STRING COMMENT '目标id ',
        `page_item_type` STRING COMMENT '目标类型',
        `last_page_id` STRING COMMENT '上页类型',
        `page_id` STRING COMMENT '页面ID ',
        `source_type` STRING COMMENT '来源类型',
        `entry` STRING COMMENT ' icon手机图标  notice 通知 install 安装后启动',
        `loading_time` STRING COMMENT '启动加载时间',
        `open_ad_id` STRING COMMENT '广告页ID ',
        `open_ad_ms` STRING COMMENT '广告总共播放时间',
        `open_ad_skip_ms` STRING COMMENT '用户跳过广告时点',
        `actions` STRING COMMENT '动作',
        `displays` STRING COMMENT '曝光',
        `ts` STRING COMMENT '时间',
        `error_code` STRING COMMENT '错误码',
        `msg` STRING COMMENT '错误信息'
    ) COMMENT '错误日志表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwd/dwd_error_log'
    TBLPROPERTIES('parquet.compression'='lzo');
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31

    说明:此处为对动作数组和曝光数组做处理,如需分析错误与单个动作或曝光的关联,可先使用explode_json_array

  2. 数据导入

    insert overwrite table dwd_error_log partition(dt='2020-06-14')
    select
        get_json_object(line,'$.common.ar'),
        get_json_object(line,'$.common.ba'),
        get_json_object(line,'$.common.ch'),
        get_json_object(line,'$.common.is_new'),
        get_json_object(line,'$.common.md'),
        get_json_object(line,'$.common.mid'),
        get_json_object(line,'$.common.os'),
        get_json_object(line,'$.common.uid'),
        get_json_object(line,'$.common.vc'),
        get_json_object(line,'$.page.item'),
        get_json_object(line,'$.page.item_type'),
        get_json_object(line,'$.page.last_page_id'),
        get_json_object(line,'$.page.page_id'),
        get_json_object(line,'$.page.source_type'),
        get_json_object(line,'$.start.entry'),
        get_json_object(line,'$.start.loading_time'),
        get_json_object(line,'$.start.open_ad_id'),
        get_json_object(line,'$.start.open_ad_ms'),
        get_json_object(line,'$.start.open_ad_skip_ms'),
        get_json_object(line,'$.actions'),
        get_json_object(line,'$.displays'),
        get_json_object(line,'$.ts'),
        get_json_object(line,'$.err.error_code'),
        get_json_object(line,'$.err.msg')
    from ods_log
    where dt='2020-06-14'
    and get_json_object(line,'$.err') is not null;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
  3. 查看数据

    select * from dwd_error_log where dt='2020-06-14' limit 2;
    
    1

# DWD层用户行为数据加载脚本

  1. 脚本编写

    1. 在ha01的/home/damoncai/bin目录下创建脚本

      #!/bin/bash
      
      APP=gmall
      # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
      if [ -n "$2" ] ;then
          do_date=$2
      else 
          do_date=`date -d "-1 day" +%F`
      fi
      
      dwd_start_log="
      set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
      insert overwrite table ${APP}.dwd_start_log partition(dt='$do_date')
      select
          get_json_object(line,'$.common.ar'),
          get_json_object(line,'$.common.ba'),
          get_json_object(line,'$.common.ch'),
          get_json_object(line,'$.common.is_new'),
          get_json_object(line,'$.common.md'),
          get_json_object(line,'$.common.mid'),
          get_json_object(line,'$.common.os'),
          get_json_object(line,'$.common.uid'),
          get_json_object(line,'$.common.vc'),
          get_json_object(line,'$.start.entry'),
          get_json_object(line,'$.start.loading_time'),
          get_json_object(line,'$.start.open_ad_id'),
          get_json_object(line,'$.start.open_ad_ms'),
          get_json_object(line,'$.start.open_ad_skip_ms'),
          get_json_object(line,'$.ts')
      from ${APP}.ods_log
      where dt='$do_date'
      and get_json_object(line,'$.start') is not null;"
      
      dwd_page_log="
      set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
      insert overwrite table ${APP}.dwd_page_log partition(dt='$do_date')
      select
          get_json_object(line,'$.common.ar'),
          get_json_object(line,'$.common.ba'),
          get_json_object(line,'$.common.ch'),
          get_json_object(line,'$.common.is_new'),
          get_json_object(line,'$.common.md'),
          get_json_object(line,'$.common.mid'),
          get_json_object(line,'$.common.os'),
          get_json_object(line,'$.common.uid'),
          get_json_object(line,'$.common.vc'),
          get_json_object(line,'$.page.during_time'),
          get_json_object(line,'$.page.item'),
          get_json_object(line,'$.page.item_type'),
          get_json_object(line,'$.page.last_page_id'),
          get_json_object(line,'$.page.page_id'),
          get_json_object(line,'$.page.source_type'),
          get_json_object(line,'$.ts')
      from ${APP}.ods_log
      where dt='$do_date'
      and get_json_object(line,'$.page') is not null;"
      
      dwd_action_log="
      set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
      insert overwrite table ${APP}.dwd_action_log partition(dt='$do_date')
      select
          get_json_object(line,'$.common.ar'),
          get_json_object(line,'$.common.ba'),
          get_json_object(line,'$.common.ch'),
          get_json_object(line,'$.common.is_new'),
          get_json_object(line,'$.common.md'),
          get_json_object(line,'$.common.mid'),
          get_json_object(line,'$.common.os'),
          get_json_object(line,'$.common.uid'),
          get_json_object(line,'$.common.vc'),
          get_json_object(line,'$.page.during_time'),
          get_json_object(line,'$.page.item'),
          get_json_object(line,'$.page.item_type'),
          get_json_object(line,'$.page.last_page_id'),
          get_json_object(line,'$.page.page_id'),
          get_json_object(line,'$.page.source_type'),
          get_json_object(action,'$.action_id'),
          get_json_object(action,'$.item'),
          get_json_object(action,'$.item_type'),
          get_json_object(action,'$.ts')
      from ${APP}.ods_log lateral view ${APP}.explode_json_array(get_json_object(line,'$.actions')) tmp as action
      where dt='$do_date'
      and get_json_object(line,'$.actions') is not null;"
      
      
      dwd_display_log="
      set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
      insert overwrite table ${APP}.dwd_display_log partition(dt='$do_date')
      select
          get_json_object(line,'$.common.ar'),
          get_json_object(line,'$.common.ba'),
          get_json_object(line,'$.common.ch'),
          get_json_object(line,'$.common.is_new'),
          get_json_object(line,'$.common.md'),
          get_json_object(line,'$.common.mid'),
          get_json_object(line,'$.common.os'),
          get_json_object(line,'$.common.uid'),
          get_json_object(line,'$.common.vc'),
          get_json_object(line,'$.page.during_time'),
          get_json_object(line,'$.page.item'),
          get_json_object(line,'$.page.item_type'),
          get_json_object(line,'$.page.last_page_id'),
          get_json_object(line,'$.page.page_id'),
          get_json_object(line,'$.page.source_type'),
          get_json_object(line,'$.ts'),
          get_json_object(display,'$.display_type'),
          get_json_object(display,'$.item'),
          get_json_object(display,'$.item_type'),
          get_json_object(display,'$.order'),
          get_json_object(display,'$.pos_id')
      from ${APP}.ods_log lateral view ${APP}.explode_json_array(get_json_object(line,'$.displays')) tmp as display
      where dt='$do_date'
      and get_json_object(line,'$.displays') is not null;"
      
      
      dwd_error_log="
      set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
      insert overwrite table ${APP}.dwd_error_log partition(dt='$do_date')
      select
          get_json_object(line,'$.common.ar'),
          get_json_object(line,'$.common.ba'),
          get_json_object(line,'$.common.ch'),
          get_json_object(line,'$.common.is_new'),
          get_json_object(line,'$.common.md'),
          get_json_object(line,'$.common.mid'),
          get_json_object(line,'$.common.os'),
          get_json_object(line,'$.common.uid'),
          get_json_object(line,'$.common.vc'),
          get_json_object(line,'$.page.item'),
          get_json_object(line,'$.page.item_type'),
          get_json_object(line,'$.page.last_page_id'),
          get_json_object(line,'$.page.page_id'),
          get_json_object(line,'$.page.source_type'),
          get_json_object(line,'$.start.entry'),
          get_json_object(line,'$.start.loading_time'),
          get_json_object(line,'$.start.open_ad_id'),
          get_json_object(line,'$.start.open_ad_ms'),
          get_json_object(line,'$.start.open_ad_skip_ms'),
          get_json_object(line,'$.actions'),
          get_json_object(line,'$.displays'),
          get_json_object(line,'$.ts'),
          get_json_object(line,'$.err.error_code'),
          get_json_object(line,'$.err.msg')
      from ${APP}.ods_log
      where dt='$do_date'
      and get_json_object(line,'$.err') is not null;"
      
      
      case $1 in
          dwd_start_log )
              hive -e "$dwd_start_log"
          ;;
          dwd_page_log )
              hive -e "$dwd_page_log"
          ;;
          dwd_action_log )
              hive -e "$dwd_action_log"
          ;;
          dwd_display_log )
              hive -e "$dwd_display_log"
          ;;
          dwd_error_log )
              hive -e "$dwd_error_log"
          ;;
          all )
              hive -e "$dwd_start_log$dwd_page_log$dwd_action_log$dwd_display_log$dwd_error_log"
          ;;
      esac
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133
      134
      135
      136
      137
      138
      139
      140
      141
      142
      143
      144
      145
      146
      147
      148
      149
      150
      151
      152
      153
      154
      155
      156
      157
      158
      159
      160
      161
      162
      163
      164
      165
      166
      167
      168
    2. 添加权限

    3. 执行脚本

      ods_to_dwd_log.sh all 2020-06-14
      
      1

# DWD层(业务数据)

业务数据方面DWD层的搭建主要注意点在于维度建模

# 评价事实表(事务型事实表)

  1. 建表语句

    DROP TABLE IF EXISTS dwd_comment_info;
    CREATE EXTERNAL TABLE dwd_comment_info(
        `id` STRING COMMENT '编号',
        `user_id` STRING COMMENT '用户ID',
        `sku_id` STRING COMMENT '商品sku',
        `spu_id` STRING COMMENT '商品spu',
        `order_id` STRING COMMENT '订单ID',
        `appraise` STRING COMMENT '评价(好评、中评、差评、默认评价)',
        `create_time` STRING COMMENT '评价时间'
    ) COMMENT '评价事实表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwd/dwd_comment_info/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
  2. 分区规划

  3. 数据装载

    1. 首日装载

      insert overwrite table dwd_comment_info partition (dt)
      select
          id,
          user_id,
          sku_id,
          spu_id,
          order_id,
          appraise,
          create_time,
          date_format(create_time,'yyyy-MM-dd')
      from ods_comment_info
      where dt='2020-06-14';
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
    2. 每日装载

      insert overwrite table dwd_comment_info partition(dt='2020-06-15')
      select
          id,
          user_id,
          sku_id,
          spu_id,
          order_id,
          appraise,
          create_time
      from ods_comment_info where dt='2020-06-15';
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10

# 订单明细事实表(事务型事实表)

  1. 建表语句

    DROP TABLE IF EXISTS dwd_order_detail;
    CREATE EXTERNAL TABLE dwd_order_detail (
        `id` STRING COMMENT '订单编号',
        `order_id` STRING COMMENT '订单号',
        `user_id` STRING COMMENT '用户id',
        `sku_id` STRING COMMENT 'sku商品id',
        `province_id` STRING COMMENT '省份ID',
        `activity_id` STRING COMMENT '活动ID',
        `activity_rule_id` STRING COMMENT '活动规则ID',
        `coupon_id` STRING COMMENT '优惠券ID',
        `create_time` STRING COMMENT '创建时间',
        `source_type` STRING COMMENT '来源类型',
        `source_id` STRING COMMENT '来源编号',
        `sku_num` BIGINT COMMENT '商品数量',
        `original_amount` DECIMAL(16,2) COMMENT '原始价格',
        `split_activity_amount` DECIMAL(16,2) COMMENT '活动优惠分摊',
        `split_coupon_amount` DECIMAL(16,2) COMMENT '优惠券优惠分摊',
        `split_final_amount` DECIMAL(16,2) COMMENT '最终价格分摊'
    ) COMMENT '订单明细事实表表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwd/dwd_order_detail/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
  2. 分区规划

  3. 数据装载

    1. 首日装载

      insert overwrite table dwd_order_detail partition(dt)
      select
          od.id,
          od.order_id,
          oi.user_id,
          od.sku_id,
          oi.province_id,
          oda.activity_id,
          oda.activity_rule_id,
           odc.coupon_id,
          od.create_time,
          od.source_type,
          od.source_id,
          od.sku_num,
          od.order_price*od.sku_num,
          od.split_activity_amount,
          od.split_coupon_amount,
          od.split_final_amount,
          date_format(create_time,'yyyy-MM-dd')
      from
      (
          select
              *
          from ods_order_detail
          where dt='2020-06-14'
      )od
      left join
      (
          select
              id,
              user_id,
              province_id
          from ods_order_info
          where dt='2020-06-14'
      )oi
      on od.order_id=oi.id
      left join
      (
          select
              order_detail_id,
              activity_id,
              activity_rule_id
          from ods_order_detail_activity
          where dt='2020-06-14'
      )oda
      on od.id=oda.order_detail_id
      left join
      (
          select
              order_detail_id,
              coupon_id
          from ods_order_detail_coupon
          where dt='2020-06-14'
      )odc
      on od.id=odc.order_detail_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
    2. 每日装载

      insert overwrite table dwd_order_detail partition(dt='2020-06-15')
      select
          od.id,
          od.order_id,
          oi.user_id,
          od.sku_id,
          oi.province_id,
          oda.activity_id,
          oda.activity_rule_id,
           odc.coupon_id,
          od.create_time,
          od.source_type,
          od.source_id,
          od.sku_num,
          od.order_price*od.sku_num,
          od.split_activity_amount,
          od.split_coupon_amount,
          od.split_final_amount
      from
      (
          select
              *
          from ods_order_detail
          where dt='2020-06-15'
      )od
      left join
      (
          select
              id,
              user_id,
              province_id
          from ods_order_info
          where dt='2020-06-15'
      )oi
      on od.order_id=oi.id
      left join
      (
          select
              order_detail_id,
              activity_id,
              activity_rule_id
          from ods_order_detail_activity
          where dt='2020-06-15'
      )oda
      on od.id=oda.order_detail_id
      left join
      (
          select
              order_detail_id,
              coupon_id
          from ods_order_detail_coupon
          where dt='2020-06-15'
      )odc
      on od.id=odc.order_detail_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54

# 退单事实表(事务型事实表)

  1. 建表语句

    DROP TABLE IF EXISTS dwd_order_refund_info;
    CREATE EXTERNAL TABLE dwd_order_refund_info(
        `id` STRING COMMENT '编号',
        `user_id` STRING COMMENT '用户ID',
        `order_id` STRING COMMENT '订单ID',
        `sku_id` STRING COMMENT '商品ID',
        `province_id` STRING COMMENT '地区ID',
         `refund_type` STRING COMMENT '退单类型',
        `refund_num` BIGINT COMMENT '退单件数',
        `refund_amount` DECIMAL(16,2) COMMENT '退单金额',
        `refund_reason_type` STRING COMMENT '退单原因类型',
        `create_time` STRING COMMENT '退单时间'
    ) COMMENT '退单事实表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwd/dwd_order_refund_info/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
  2. 分区规划

  3. 数据装载

    1. 首日装载

      insert overwrite table dwd_order_refund_info partition(dt)
      select
          ri.id,
          ri.user_id,
          ri.order_id,
          ri.sku_id,
          oi.province_id,
          ri.refund_type,
          ri.refund_num,
          ri.refund_amount,
          ri.refund_reason_type,
          ri.create_time,
          date_format(ri.create_time,'yyyy-MM-dd')
      from
      (
          select * from ods_order_refund_info where dt='2020-06-14'
      )ri
      left join
      (
          select id,province_id from ods_order_info where dt='2020-06-14'
      )oi
      on ri.order_id=oi.id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
    2. 每日装载

      insert overwrite table dwd_order_refund_info partition(dt='2020-06-15')
      select
          ri.id,
          ri.user_id,
          ri.order_id,
          ri.sku_id,
          oi.province_id,
          ri.refund_type,
          ri.refund_num,
          ri.refund_amount,
          ri.refund_reason_type,
          ri.create_time
      from
      (
          select * from ods_order_refund_info where dt='2020-06-15'
      )ri
      left join
      (
          select id,province_id from ods_order_info where dt='2020-06-15'
      )oi
      on ri.order_id=oi.id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21

# 加购事实表(周期型快照事实表,每日快照)

  1. 建表语句

    DROP TABLE IF EXISTS dwd_cart_info;
    CREATE EXTERNAL TABLE dwd_cart_info(
        `id` STRING COMMENT '编号',
        `user_id` STRING COMMENT '用户ID',
        `sku_id` STRING COMMENT '商品ID',
        `source_type` STRING COMMENT '来源类型',
        `source_id` STRING COMMENT '来源编号',
        `cart_price` DECIMAL(16,2) COMMENT '加入购物车时的价格',
        `is_ordered` STRING COMMENT '是否已下单',
        `create_time` STRING COMMENT '创建时间',
        `operate_time` STRING COMMENT '修改时间',
        `order_time` STRING COMMENT '下单时间',
        `sku_num` BIGINT COMMENT '加购数量'
    ) COMMENT '加购事实表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwd/dwd_cart_info/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
  2. 分区规划

  3. 数据装载

    1. 首日装载

      insert overwrite table dwd_cart_info partition(dt='2020-06-14')
      select
          id,
          user_id,
          sku_id,
          source_type,
          source_id,
          cart_price,
          is_ordered,
          create_time,
          operate_time,
          order_time,
          sku_num
      from ods_cart_info
      where dt='2020-06-14';
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
    2. 每日装载

      insert overwrite table dwd_cart_info partition(dt='2020-06-15')
      select
          id,
          user_id,
          sku_id,
          source_type,
          source_id,
          cart_price,
          is_ordered,
          create_time,
          operate_time,
          order_time,
          sku_num
      from ods_cart_info
      where dt='2020-06-15';
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15

# 收藏事实表(周期型快照事实表,每日快照)

  1. 建表语句

    DROP TABLE IF EXISTS dwd_favor_info;
    CREATE EXTERNAL TABLE dwd_favor_info(
        `id` STRING COMMENT '编号',
        `user_id` STRING  COMMENT '用户id',
        `sku_id` STRING  COMMENT 'skuid',
        `spu_id` STRING  COMMENT 'spuid',
        `is_cancel` STRING  COMMENT '是否取消',
        `create_time` STRING  COMMENT '收藏时间',
        `cancel_time` STRING  COMMENT '取消时间'
    ) COMMENT '收藏事实表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwd/dwd_favor_info/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
  2. 分区规划

  3. 数据装载

    1. 首日装载

      insert overwrite table dwd_favor_info partition(dt='2020-06-14')
      select
          id,
          user_id,
          sku_id,
          spu_id,
          is_cancel,
          create_time,
          cancel_time
      from ods_favor_info
      where dt='2020-06-14';
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
    2. 每日装载

      insert overwrite table dwd_favor_info partition(dt='2020-06-15')
      select
          id,
          user_id,
          sku_id,
          spu_id,
          is_cancel,
          create_time,
          cancel_time
      from ods_favor_info
      where dt='2020-06-15';
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11

# 优惠券领用事实表(累积型快照事实表)

  1. 建表语句

    DROP TABLE IF EXISTS dwd_coupon_use;
    CREATE EXTERNAL TABLE dwd_coupon_use(
        `id` STRING COMMENT '编号',
        `coupon_id` STRING  COMMENT '优惠券ID',
        `user_id` STRING  COMMENT 'userid',
        `order_id` STRING  COMMENT '订单id',
        `coupon_status` STRING  COMMENT '优惠券状态',
        `get_time` STRING  COMMENT '领取时间',
        `using_time` STRING  COMMENT '使用时间(下单)',
        `used_time` STRING  COMMENT '使用时间(支付)',
        `expire_time` STRING COMMENT '过期时间'
    ) COMMENT '优惠券领用事实表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwd/dwd_coupon_use/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
  2. 分区规划

  3. 数据装载

    1. 首日装载

      insert overwrite table dwd_coupon_use partition(dt)
      select
          id,
          coupon_id,
          user_id,
          order_id,
          coupon_status,
          get_time,
          using_time,
          used_time,
          expire_time,
          coalesce(date_format(used_time,'yyyy-MM-dd'),date_format(expire_time,'yyyy-MM-dd'),'9999-99-99')
      from ods_coupon_use
      where dt='2020-06-14';
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
    2. 每日装载

      装载逻辑

      转载语句

      insert overwrite table dwd_coupon_use partition(dt)
      select
          nvl(new.id,old.id),
          nvl(new.coupon_id,old.coupon_id),
          nvl(new.user_id,old.user_id),
          nvl(new.order_id,old.order_id),
          nvl(new.coupon_status,old.coupon_status),
          nvl(new.get_time,old.get_time),
          nvl(new.using_time,old.using_time),
          nvl(new.used_time,old.used_time),
          nvl(new.expire_time,old.expire_time),
          coalesce(date_format(nvl(new.used_time,old.used_time),'yyyy-MM-dd'),date_format(nvl(new.expire_time,old.expire_time),'yyyy-MM-dd'),'9999-99-99')
      from
      (
          select
              id,
              coupon_id,
              user_id,
              order_id,
              coupon_status,
              get_time,
              using_time,
              used_time,
              expire_time
          from dwd_coupon_use
          where dt='9999-99-99'
      )old
      full outer join
      (
          select
              id,
              coupon_id,
              user_id,
              order_id,
              coupon_status,
              get_time,
              using_time,
              used_time,
              expire_time
          from ods_coupon_use
          where dt='2020-06-15'
      )new
      on old.id=new.id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43

# 支付事实表(累积型快照事实表)

  1. 建表语句

    DROP TABLE IF EXISTS dwd_payment_info;
    CREATE EXTERNAL TABLE dwd_payment_info (
        `id` STRING COMMENT '编号',
        `order_id` STRING COMMENT '订单编号',
        `user_id` STRING COMMENT '用户编号',
        `province_id` STRING COMMENT '地区ID',
        `trade_no` STRING COMMENT '交易编号',
        `out_trade_no` STRING COMMENT '对外交易编号',
        `payment_type` STRING COMMENT '支付类型',
        `payment_amount` DECIMAL(16,2) COMMENT '支付金额',
        `payment_status` STRING COMMENT '支付状态',
        `create_time` STRING COMMENT '创建时间',--调用第三方支付接口的时间
        `callback_time` STRING COMMENT '完成时间'--支付完成时间,即支付成功回调时间
    ) COMMENT '支付事实表表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwd/dwd_payment_info/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
  2. 分区规划

  3. 数据装载

    1. 首日装载

      insert overwrite table dwd_payment_info partition(dt)
      select
          pi.id,
          pi.order_id,
          pi.user_id,
          oi.province_id,
          pi.trade_no,
          pi.out_trade_no,
          pi.payment_type,
          pi.payment_amount,
          pi.payment_status,
          pi.create_time,
          pi.callback_time,
          nvl(date_format(pi.callback_time,'yyyy-MM-dd'),'9999-99-99')
      from
      (
          select * from ods_payment_info where dt='2020-06-14'
      )pi
      left join
      (
          select id,province_id from ods_order_info where dt='2020-06-14'
      )oi
      on pi.order_id=oi.id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
    2. 每日装载

      insert overwrite table dwd_payment_info partition(dt)
      select
          nvl(new.id,old.id),
          nvl(new.order_id,old.order_id),
          nvl(new.user_id,old.user_id),
          nvl(new.province_id,old.province_id),
          nvl(new.trade_no,old.trade_no),
          nvl(new.out_trade_no,old.out_trade_no),
          nvl(new.payment_type,old.payment_type),
          nvl(new.payment_amount,old.payment_amount),
          nvl(new.payment_status,old.payment_status),
          nvl(new.create_time,old.create_time),
          nvl(new.callback_time,old.callback_time),
          nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
      from
      (
          select id,
             order_id,
             user_id,
             province_id,
             trade_no,
             out_trade_no,
             payment_type,
             payment_amount,
             payment_status,
             create_time,
             callback_time
          from dwd_payment_info
          where dt = '9999-99-99'
      )old
      full outer join
      (
          select
              pi.id,
              pi.out_trade_no,
              pi.order_id,
              pi.user_id,
              oi.province_id,
              pi.payment_type,
              pi.trade_no,
              pi.payment_amount,
              pi.payment_status,
              pi.create_time,
              pi.callback_time
          from
          (
              select * from ods_payment_info where dt='2020-06-15'
          )pi
          left join
          (
              select id,province_id from ods_order_info where dt='2020-06-15'
          )oi
          on pi.order_id=oi.id
      )new
      on old.id=new.id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55

# 退款事实表(累积型快照事实表)

  1. 建表语句

    DROP TABLE IF EXISTS dwd_refund_payment;
    CREATE EXTERNAL TABLE dwd_refund_payment (
        `id` STRING COMMENT '编号',
        `user_id` STRING COMMENT '用户ID',
        `order_id` STRING COMMENT '订单编号',
        `sku_id` STRING COMMENT 'SKU编号',
        `province_id` STRING COMMENT '地区ID',
        `trade_no` STRING COMMENT '交易编号',
        `out_trade_no` STRING COMMENT '对外交易编号',
        `payment_type` STRING COMMENT '支付类型',
        `refund_amount` DECIMAL(16,2) COMMENT '退款金额',
        `refund_status` STRING COMMENT '退款状态',
        `create_time` STRING COMMENT '创建时间',--调用第三方支付接口的时间
        `callback_time` STRING COMMENT '回调时间'--支付接口回调时间,即支付成功时间
    ) COMMENT '退款事实表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwd/dwd_refund_payment/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
  2. 分区规划

    1. 数据装载

      1. 首日装载

        insert overwrite table dwd_refund_payment partition(dt)
        select
            rp.id,
            user_id,
            order_id,
            sku_id,
            province_id,
            trade_no,
            out_trade_no,
            payment_type,
            refund_amount,
            refund_status,
            create_time,
            callback_time,
            nvl(date_format(callback_time,'yyyy-MM-dd'),'9999-99-99')
        from
        (
            select
                id,
                out_trade_no,
                order_id,
                sku_id,
                payment_type,
                trade_no,
                refund_amount,
                refund_status,
                create_time,
                callback_time
            from ods_refund_payment
            where dt='2020-06-14'
        )rp
        left join
        (
            select
                id,
                user_id,
                province_id
            from ods_order_info
            where dt='2020-06-14'
        )oi
        on rp.order_id=oi.id;
        
        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
      2. 每日装载

        insert overwrite table dwd_refund_payment partition(dt)
        select
            nvl(new.id,old.id),
            nvl(new.user_id,old.user_id),
            nvl(new.order_id,old.order_id),
            nvl(new.sku_id,old.sku_id),
            nvl(new.province_id,old.province_id),
            nvl(new.trade_no,old.trade_no),
            nvl(new.out_trade_no,old.out_trade_no),
            nvl(new.payment_type,old.payment_type),
            nvl(new.refund_amount,old.refund_amount),
            nvl(new.refund_status,old.refund_status),
            nvl(new.create_time,old.create_time),
            nvl(new.callback_time,old.callback_time),
            nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
        from
        (
            select
                id,
                user_id,
                order_id,
                sku_id,
                province_id,
                trade_no,
                out_trade_no,
                payment_type,
                refund_amount,
                refund_status,
                create_time,
                callback_time
            from dwd_refund_payment
            where dt='9999-99-99'
        )old
        full outer join
        (
            select
                rp.id,
                user_id,
                order_id,
                sku_id,
                province_id,
                trade_no,
                out_trade_no,
                payment_type,
                refund_amount,
                refund_status,
                create_time,
                callback_time
            from
            (
                select
                    id,
                    out_trade_no,
                    order_id,
                    sku_id,
                    payment_type,
                    trade_no,
                    refund_amount,
                    refund_status,
                    create_time,
                    callback_time
                from ods_refund_payment
                where dt='2020-06-15'
            )rp
            left join
            (
                select
                    id,
                    user_id,
                    province_id
                from ods_order_info
                where dt='2020-06-15'
            )oi
            on rp.order_id=oi.id
        )new
        on old.id=new.id;
        
        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
        12
        13
        14
        15
        16
        17
        18
        19
        20
        21
        22
        23
        24
        25
        26
        27
        28
        29
        30
        31
        32
        33
        34
        35
        36
        37
        38
        39
        40
        41
        42
        43
        44
        45
        46
        47
        48
        49
        50
        51
        52
        53
        54
        55
        56
        57
        58
        59
        60
        61
        62
        63
        64
        65
        66
        67
        68
        69
        70
        71
        72
        73
        74
        75
        76

# 订单事实表(累积型快照事实表)

  1. 建表语句

    DROP TABLE IF EXISTS dwd_order_info;
    CREATE EXTERNAL TABLE dwd_order_info(
        `id` STRING COMMENT '编号',
        `order_status` STRING COMMENT '订单状态',
        `user_id` STRING COMMENT '用户ID',
        `province_id` STRING COMMENT '地区ID',
        `payment_way` STRING COMMENT '支付方式',
        `delivery_address` STRING COMMENT '邮寄地址',
        `out_trade_no` STRING COMMENT '对外交易编号',
        `tracking_no` STRING COMMENT '物流单号',
        `create_time` STRING COMMENT '创建时间(未支付状态)',
        `payment_time` STRING COMMENT '支付时间(已支付状态)',
        `cancel_time` STRING COMMENT '取消时间(已取消状态)',
        `finish_time` STRING COMMENT '完成时间(已完成状态)',
        `refund_time` STRING COMMENT '退款时间(退款中状态)',
        `refund_finish_time` STRING COMMENT '退款完成时间(退款完成状态)',
        `expire_time` STRING COMMENT '过期时间',
        `feight_fee` DECIMAL(16,2) COMMENT '运费',
        `feight_fee_reduce` DECIMAL(16,2) COMMENT '运费减免',
        `activity_reduce_amount` DECIMAL(16,2) COMMENT '活动减免',
        `coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠券减免',
        `original_amount` DECIMAL(16,2) COMMENT '订单原始价格',
        `final_amount` DECIMAL(16,2) COMMENT '订单最终价格'
    ) COMMENT '订单事实表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwd/dwd_order_info/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
  2. 分区规划

  3. 数据装载

    1. 首日装载

      insert overwrite table dwd_order_info partition(dt)
      select
          oi.id,
          oi.order_status,
          oi.user_id,
          oi.province_id,
          oi.payment_way,
          oi.delivery_address,
          oi.out_trade_no,
          oi.tracking_no,
          oi.create_time,
          times.ts['1002'] payment_time,
          times.ts['1003'] cancel_time,
          times.ts['1004'] finish_time,
          times.ts['1005'] refund_time,
          times.ts['1006'] refund_finish_time,
          oi.expire_time,
          feight_fee,
          feight_fee_reduce,
          activity_reduce_amount,
          coupon_reduce_amount,
          original_amount,
          final_amount,
          case
              when times.ts['1003'] is not null then date_format(times.ts['1003'],'yyyy-MM-dd')
              when times.ts['1004'] is not null and date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)<='2020-06-14' and times.ts['1005'] is null then date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)
              when times.ts['1006'] is not null then date_format(times.ts['1006'],'yyyy-MM-dd')
              when oi.expire_time is not null then date_format(oi.expire_time,'yyyy-MM-dd')
              else '9999-99-99'
          end
      from
      (
          select
              *
          from ods_order_info
          where dt='2020-06-14'
      )oi
      left join
      (
          select
              order_id,
              str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts
          from ods_order_status_log
          where dt='2020-06-14'
          group by order_id
      )times
      on oi.id=times.order_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
    2. 每日装载

      insert overwrite table dwd_order_info partition(dt)
      select
          nvl(new.id,old.id),
          nvl(new.order_status,old.order_status),
          nvl(new.user_id,old.user_id),
          nvl(new.province_id,old.province_id),
          nvl(new.payment_way,old.payment_way),
          nvl(new.delivery_address,old.delivery_address),
          nvl(new.out_trade_no,old.out_trade_no),
          nvl(new.tracking_no,old.tracking_no),
          nvl(new.create_time,old.create_time),
          nvl(new.payment_time,old.payment_time),
          nvl(new.cancel_time,old.cancel_time),
          nvl(new.finish_time,old.finish_time),
          nvl(new.refund_time,old.refund_time),
          nvl(new.refund_finish_time,old.refund_finish_time),
          nvl(new.expire_time,old.expire_time),
          nvl(new.feight_fee,old.feight_fee),
          nvl(new.feight_fee_reduce,old.feight_fee_reduce),
          nvl(new.activity_reduce_amount,old.activity_reduce_amount),
          nvl(new.coupon_reduce_amount,old.coupon_reduce_amount),
          nvl(new.original_amount,old.original_amount),
          nvl(new.final_amount,old.final_amount),
          case
              when new.cancel_time is not null then date_format(new.cancel_time,'yyyy-MM-dd')
              when new.finish_time is not null and date_add(date_format(new.finish_time,'yyyy-MM-dd'),7)='2020-06-15' and new.refund_time is null then '2020-06-15'
              when new.refund_finish_time is not null then date_format(new.refund_finish_time,'yyyy-MM-dd')
              when new.expire_time is not null then date_format(new.expire_time,'yyyy-MM-dd')
              else '9999-99-99'
          end
      from
      (
          select
              id,
              order_status,
              user_id,
              province_id,
              payment_way,
              delivery_address,
              out_trade_no,
              tracking_no,
              create_time,
              payment_time,
              cancel_time,
              finish_time,
              refund_time,
              refund_finish_time,
              expire_time,
              feight_fee,
              feight_fee_reduce,
              activity_reduce_amount,
              coupon_reduce_amount,
              original_amount,
              final_amount
          from dwd_order_info
          where dt='9999-99-99'
      )old
      full outer join
      (
          select
              oi.id,
              oi.order_status,
              oi.user_id,
              oi.province_id,
              oi.payment_way,
              oi.delivery_address,
              oi.out_trade_no,
              oi.tracking_no,
              oi.create_time,
              times.ts['1002'] payment_time,
              times.ts['1003'] cancel_time,
              times.ts['1004'] finish_time,
              times.ts['1005'] refund_time,
              times.ts['1006'] refund_finish_time,
              oi.expire_time,
              feight_fee,
              feight_fee_reduce,
              activity_reduce_amount,
              coupon_reduce_amount,
              original_amount,
              final_amount
          from
          (
              select
                  *
              from ods_order_info
              where dt='2020-06-15'
          )oi
          left join
          (
              select
                  order_id,
                  str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts
              from ods_order_status_log
              where dt='2020-06-15'
              group by order_id
          )times
          on oi.id=times.order_id
      )new
      on old.id=new.id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100

# DWD层业务数据首日装载脚本

  1. 在/home/damonca/bin目录下创建脚本ods_to_dwd_db_init.sh

    vim ods_to_dwd_db_init.sh
    
    1
    #!/bin/bash
    APP=gmall
    
    if [ -n "$2" ] ;then
       do_date=$2
    else 
       echo "请传入日期参数"
       exit
    fi 
    
    dwd_order_info="
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dwd_order_info partition(dt)
    select
        oi.id,
        oi.order_status,
        oi.user_id,
        oi.province_id,
        oi.payment_way,
        oi.delivery_address,
        oi.out_trade_no,
        oi.tracking_no,
        oi.create_time,
        times.ts['1002'] payment_time,
        times.ts['1003'] cancel_time,
        times.ts['1004'] finish_time,
        times.ts['1005'] refund_time,
        times.ts['1006'] refund_finish_time,
        oi.expire_time,
        feight_fee,
        feight_fee_reduce,
        activity_reduce_amount,
        coupon_reduce_amount,
        original_amount,
        final_amount,
        case
            when times.ts['1003'] is not null then date_format(times.ts['1003'],'yyyy-MM-dd')
            when times.ts['1004'] is not null and date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)<='$do_date' and times.ts['1005'] is null then date_add(date_format(times.ts['1004'],'yyyy-MM-dd'),7)
            when times.ts['1006'] is not null then date_format(times.ts['1006'],'yyyy-MM-dd')
            when oi.expire_time is not null then date_format(oi.expire_time,'yyyy-MM-dd')
            else '9999-99-99'
        end
    from
    (
        select
            *
        from ${APP}.ods_order_info
        where dt='$do_date'
    )oi
    left join
    (
        select
            order_id,
            str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts
        from ${APP}.ods_order_status_log
        where dt='$do_date'
        group by order_id
    )times
    on oi.id=times.order_id;"
    
    dwd_order_detail="
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dwd_order_detail partition(dt)
    select
        od.id,
        od.order_id,
        oi.user_id,
        od.sku_id,
        oi.province_id,
        oda.activity_id,
        oda.activity_rule_id,
        odc.coupon_id,
        od.create_time,
        od.source_type,
        od.source_id,
        od.sku_num,
        od.order_price*od.sku_num,
        od.split_activity_amount,
        od.split_coupon_amount,
        od.split_final_amount,
        date_format(create_time,'yyyy-MM-dd')
    from
    (
        select
            *
        from ${APP}.ods_order_detail
        where dt='$do_date'
    )od
    left join
    (
        select
            id,
            user_id,
            province_id
        from ${APP}.ods_order_info
        where dt='$do_date'
    )oi
    on od.order_id=oi.id
    left join
    (
        select
            order_detail_id,
            activity_id,
            activity_rule_id
        from ${APP}.ods_order_detail_activity
        where dt='$do_date'
    )oda
    on od.id=oda.order_detail_id
    left join
    (
        select
            order_detail_id,
            coupon_id
        from ${APP}.ods_order_detail_coupon
        where dt='$do_date'
    )odc
    on od.id=odc.order_detail_id;"
    
    dwd_payment_info="
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dwd_payment_info partition(dt)
    select
        pi.id,
        pi.order_id,
        pi.user_id,
        oi.province_id,
        pi.trade_no,
        pi.out_trade_no,
        pi.payment_type,
        pi.payment_amount,
        pi.payment_status,
        pi.create_time,
        pi.callback_time,
        nvl(date_format(pi.callback_time,'yyyy-MM-dd'),'9999-99-99')
    from
    (
        select * from ${APP}.ods_payment_info where dt='$do_date'
    )pi
    left join
    (
        select id,province_id from ${APP}.ods_order_info where dt='$do_date'
    )oi
    on pi.order_id=oi.id;"
    
    dwd_cart_info="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dwd_cart_info partition(dt='$do_date')
    select
        id,
        user_id,
        sku_id,
        source_type,
        source_id,
        cart_price,
        is_ordered,
        create_time,
        operate_time,
        order_time,
        sku_num
    from ${APP}.ods_cart_info
    where dt='$do_date';"
    
    dwd_comment_info="
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dwd_comment_info partition(dt)
    select
        id,
        user_id,
        sku_id,
        spu_id,
        order_id,
        appraise,
        create_time,
        date_format(create_time,'yyyy-MM-dd')
    from ${APP}.ods_comment_info
    where dt='$do_date';
    "
    
    dwd_favor_info="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dwd_favor_info partition(dt='$do_date')
    select
        id,
        user_id,
        sku_id,
        spu_id,
        is_cancel,
        create_time,
        cancel_time
    from ${APP}.ods_favor_info
    where dt='$do_date';"
    
    dwd_coupon_use="
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dwd_coupon_use partition(dt)
    select
        id,
        coupon_id,
        user_id,
        order_id,
        coupon_status,
        get_time,
        using_time,
        used_time,
        expire_time,
        coalesce(date_format(used_time,'yyyy-MM-dd'),date_format(expire_time,'yyyy-MM-dd'),'9999-99-99')
    from ${APP}.ods_coupon_use
    where dt='$do_date';"
    
    dwd_order_refund_info="
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dwd_order_refund_info partition(dt)
    select
        ri.id,
        ri.user_id,
        ri.order_id,
        ri.sku_id,
        oi.province_id,
        ri.refund_type,
        ri.refund_num,
        ri.refund_amount,
        ri.refund_reason_type,
        ri.create_time,
        date_format(ri.create_time,'yyyy-MM-dd')
    from
    (
        select * from ${APP}.ods_order_refund_info where dt='$do_date'
    )ri
    left join
    (
        select id,province_id from ${APP}.ods_order_info where dt='$do_date'
    )oi
    on ri.order_id=oi.id;"
    
    dwd_refund_payment="
    set hive.exec.dynamic.partition.mode=nonstrict;
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dwd_refund_payment partition(dt)
    select
        rp.id,
        user_id,
        order_id,
        sku_id,
        province_id,
        trade_no,
        out_trade_no,
        payment_type,
        refund_amount,
        refund_status,
        create_time,
        callback_time,
        nvl(date_format(callback_time,'yyyy-MM-dd'),'9999-99-99')
    from
    (
        select
            id,
            out_trade_no,
            order_id,
            sku_id,
            payment_type,
            trade_no,
            refund_amount,
            refund_status,
            create_time,
            callback_time
        from ${APP}.ods_refund_payment
        where dt='$do_date'
    )rp
    left join
    (
        select
            id,
            user_id,
            province_id
        from ${APP}.ods_order_info
        where dt='$do_date'
    )oi
    on rp.order_id=oi.id;"
    
    case $1 in
        dwd_order_info )
            hive -e "$dwd_order_info"
        ;;
        dwd_order_detail )
            hive -e "$dwd_order_detail"
        ;;
        dwd_payment_info )
            hive -e "$dwd_payment_info"
        ;;
        dwd_cart_info )
            hive -e "$dwd_cart_info"
        ;;
        dwd_comment_info )
            hive -e "$dwd_comment_info"
        ;;
        dwd_favor_info )
            hive -e "$dwd_favor_info"
        ;;
        dwd_coupon_use )
            hive -e "$dwd_coupon_use"
        ;;
        dwd_order_refund_info )
            hive -e "$dwd_order_refund_info"
        ;;
        dwd_refund_payment )
            hive -e "$dwd_refund_payment"
        ;;
        all )
            hive -e "$dwd_order_info$dwd_order_detail$dwd_payment_info$dwd_cart_info$dwd_comment_info$dwd_favor_info$dwd_coupon_use$dwd_order_refund_info$dwd_refund_payment"
        ;;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
  2. 添加权限

  3. 执行脚本

    ods_to_dwd_db_init.sh all 2020-06-14
    
    1

# DWD层业务数据每日装载脚本

  1. 在/home/damoncai/bin目录下创建脚本ods_to_dwd_db.sh

    vim ods_to_dwd_db.sh
    
    1
    #!/bin/bash
    
    APP=gmall
    # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
    if [ -n "$2" ] ;then
        do_date=$2
    else 
        do_date=`date -d "-1 day" +%F`
    fi
    
    
    # 假设某累积型快照事实表,某天所有的业务记录全部完成,则会导致9999-99-99分区的数据未被覆盖,从而导致数据重复,该函数根据9999-99-99分区的数据的末次修改时间判断其是否被覆盖了,如果未被覆盖,就手动清理
    clear_data(){
        current_date=`date +%F`
        current_date_timestamp=`date -d "$current_date" +%s`
    
        last_modified_date=`hadoop fs -ls /warehouse/gmall/dwd/$1 | grep '9999-99-99' | awk '{print $6}'`
        last_modified_date_timestamp=`date -d "$last_modified_date" +%s`
    
        if [[ $last_modified_date_timestamp -lt $current_date_timestamp ]]; then
            echo "clear table $1 partition(dt=9999-99-99)"
            hadoop fs -rm -r -f /warehouse/gmall/dwd/$1/dt=9999-99-99/*
        fi
    }
    
    dwd_order_info="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    set hive.exec.dynamic.partition.mode=nonstrict;
    insert overwrite table ${APP}.dwd_order_info partition(dt)
    select
        nvl(new.id,old.id),
        nvl(new.order_status,old.order_status),
        nvl(new.user_id,old.user_id),
        nvl(new.province_id,old.province_id),
        nvl(new.payment_way,old.payment_way),
        nvl(new.delivery_address,old.delivery_address),
        nvl(new.out_trade_no,old.out_trade_no),
        nvl(new.tracking_no,old.tracking_no),
        nvl(new.create_time,old.create_time),
        nvl(new.payment_time,old.payment_time),
        nvl(new.cancel_time,old.cancel_time),
        nvl(new.finish_time,old.finish_time),
        nvl(new.refund_time,old.refund_time),
        nvl(new.refund_finish_time,old.refund_finish_time),
        nvl(new.expire_time,old.expire_time),
        nvl(new.feight_fee,old.feight_fee),
        nvl(new.feight_fee_reduce,old.feight_fee_reduce),
        nvl(new.activity_reduce_amount,old.activity_reduce_amount),
        nvl(new.coupon_reduce_amount,old.coupon_reduce_amount),
        nvl(new.original_amount,old.original_amount),
        nvl(new.final_amount,old.final_amount),
        case
            when new.cancel_time is not null then date_format(new.cancel_time,'yyyy-MM-dd')
            when new.finish_time is not null and date_add(date_format(new.finish_time,'yyyy-MM-dd'),7)='$do_date' and new.refund_time is null then '$do_date'
            when new.refund_finish_time is not null then date_format(new.refund_finish_time,'yyyy-MM-dd')
            when new.expire_time is not null then date_format(new.expire_time,'yyyy-MM-dd')
            else '9999-99-99'
        end
    from
    (
        select
            id,
            order_status,
            user_id,
            province_id,
            payment_way,
            delivery_address,
            out_trade_no,
            tracking_no,
            create_time,
            payment_time,
            cancel_time,
            finish_time,
            refund_time,
            refund_finish_time,
            expire_time,
            feight_fee,
            feight_fee_reduce,
            activity_reduce_amount,
            coupon_reduce_amount,
            original_amount,
            final_amount
        from ${APP}.dwd_order_info
        where dt='9999-99-99'
    )old
    full outer join
    (
        select
            oi.id,
            oi.order_status,
            oi.user_id,
            oi.province_id,
            oi.payment_way,
            oi.delivery_address,
            oi.out_trade_no,
            oi.tracking_no,
            oi.create_time,
            times.ts['1002'] payment_time,
            times.ts['1003'] cancel_time,
            times.ts['1004'] finish_time,
            times.ts['1005'] refund_time,
            times.ts['1006'] refund_finish_time,
            oi.expire_time,
            feight_fee,
            feight_fee_reduce,
            activity_reduce_amount,
            coupon_reduce_amount,
            original_amount,
            final_amount
        from
        (
            select
                *
            from ${APP}.ods_order_info
            where dt='$do_date'
        )oi
        left join
        (
            select
                order_id,
                str_to_map(concat_ws(',',collect_set(concat(order_status,'=',operate_time))),',','=') ts
            from ${APP}.ods_order_status_log
            where dt='$do_date'
            group by order_id
        )times
        on oi.id=times.order_id
    )new
    on old.id=new.id;"
    
    dwd_order_detail="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dwd_order_detail partition(dt='$do_date')
    select
        od.id,
        od.order_id,
        oi.user_id,
        od.sku_id,
        oi.province_id,
        oda.activity_id,
        oda.activity_rule_id,
        odc.coupon_id,
        od.create_time,
        od.source_type,
        od.source_id,
        od.sku_num,
        od.order_price*od.sku_num,
        od.split_activity_amount,
        od.split_coupon_amount,
        od.split_final_amount
    from
    (
        select
            *
        from ${APP}.ods_order_detail
        where dt='$do_date'
    )od
    left join
    (
        select
            id,
            user_id,
            province_id
        from ${APP}.ods_order_info
        where dt='$do_date'
    )oi
    on od.order_id=oi.id
    left join
    (
        select
            order_detail_id,
            activity_id,
            activity_rule_id
        from ${APP}.ods_order_detail_activity
        where dt='$do_date'
    )oda
    on od.id=oda.order_detail_id
    left join
    (
        select
            order_detail_id,
            coupon_id
        from ${APP}.ods_order_detail_coupon
        where dt='$do_date'
    )odc
    on od.id=odc.order_detail_id;"
    
    
    dwd_payment_info="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    set hive.exec.dynamic.partition.mode=nonstrict;
    insert overwrite table ${APP}.dwd_payment_info partition(dt)
    select
        nvl(new.id,old.id),
        nvl(new.order_id,old.order_id),
        nvl(new.user_id,old.user_id),
        nvl(new.province_id,old.province_id),
        nvl(new.trade_no,old.trade_no),
        nvl(new.out_trade_no,old.out_trade_no),
        nvl(new.payment_type,old.payment_type),
        nvl(new.payment_amount,old.payment_amount),
        nvl(new.payment_status,old.payment_status),
        nvl(new.create_time,old.create_time),
        nvl(new.callback_time,old.callback_time),
        nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
    from
    (
        select id,
           order_id,
           user_id,
           province_id,
           trade_no,
           out_trade_no,
           payment_type,
           payment_amount,
           payment_status,
           create_time,
           callback_time
        from ${APP}.dwd_payment_info
        where dt = '9999-99-99'
    )old
    full outer join
    (
        select
            pi.id,
            pi.out_trade_no,
            pi.order_id,
            pi.user_id,
            oi.province_id,
            pi.payment_type,
            pi.trade_no,
            pi.payment_amount,
            pi.payment_status,
            pi.create_time,
            pi.callback_time
        from
        (
            select * from ${APP}.ods_payment_info where dt='$do_date'
        )pi
        left join
        (
            select id,province_id from ${APP}.ods_order_info where dt='$do_date'
        )oi
        on pi.order_id=oi.id
    )new
    on old.id=new.id;"
    
    dwd_cart_info="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dwd_cart_info partition(dt='$do_date')
    select
        id,
        user_id,
        sku_id,
        source_type,
        source_id,
        cart_price,
        is_ordered,
        create_time,
        operate_time,
        order_time,
        sku_num
    from ${APP}.ods_cart_info
    where dt='$do_date';"
    
    
    dwd_comment_info="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dwd_comment_info partition(dt='$do_date')
    select
        id,
        user_id,
        sku_id,
        spu_id,
        order_id,
        appraise,
        create_time
    from ${APP}.ods_comment_info where dt='$do_date';"
    
    
    dwd_favor_info="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dwd_favor_info partition(dt='$do_date')
    select
        id,
        user_id,
        sku_id,
        spu_id,
        is_cancel,
        create_time,
        cancel_time
    from ${APP}.ods_favor_info
    where dt='$do_date';"
    
    
    dwd_coupon_use="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    set hive.exec.dynamic.partition.mode=nonstrict;
    insert overwrite table ${APP}.dwd_coupon_use partition(dt)
    select
        nvl(new.id,old.id),
        nvl(new.coupon_id,old.coupon_id),
        nvl(new.user_id,old.user_id),
        nvl(new.order_id,old.order_id),
        nvl(new.coupon_status,old.coupon_status),
        nvl(new.get_time,old.get_time),
        nvl(new.using_time,old.using_time),
        nvl(new.used_time,old.used_time),
        nvl(new.expire_time,old.expire_time),
        coalesce(date_format(nvl(new.used_time,old.used_time),'yyyy-MM-dd'),date_format(nvl(new.expire_time,old.expire_time),'yyyy-MM-dd'),'9999-99-99')
    from
    (
        select
            id,
            coupon_id,
            user_id,
            order_id,
            coupon_status,
            get_time,
            using_time,
            used_time,
            expire_time
        from ${APP}.dwd_coupon_use
        where dt='9999-99-99'
    )old
    full outer join
    (
        select
            id,
            coupon_id,
            user_id,
            order_id,
            coupon_status,
            get_time,
            using_time,
            used_time,
            expire_time
        from ${APP}.ods_coupon_use
        where dt='$do_date'
    )new
    on old.id=new.id;"
    
    dwd_order_refund_info="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    insert overwrite table ${APP}.dwd_order_refund_info partition(dt='$do_date')
    select
        ri.id,
        ri.user_id,
        ri.order_id,
        ri.sku_id,
        oi.province_id,
        ri.refund_type,
        ri.refund_num,
        ri.refund_amount,
        ri.refund_reason_type,
        ri.create_time
    from
    (
        select * from ${APP}.ods_order_refund_info where dt='$do_date'
    )ri
    left join
    (
        select id,province_id from ${APP}.ods_order_info where dt='$do_date'
    )oi
    on ri.order_id=oi.id;"
    
    
    dwd_refund_payment="
    set hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat;
    set hive.exec.dynamic.partition.mode=nonstrict;
    insert overwrite table ${APP}.dwd_refund_payment partition(dt)
    select
        nvl(new.id,old.id),
        nvl(new.user_id,old.user_id),
        nvl(new.order_id,old.order_id),
        nvl(new.sku_id,old.sku_id),
        nvl(new.province_id,old.province_id),
        nvl(new.trade_no,old.trade_no),
        nvl(new.out_trade_no,old.out_trade_no),
        nvl(new.payment_type,old.payment_type),
        nvl(new.refund_amount,old.refund_amount),
        nvl(new.refund_status,old.refund_status),
        nvl(new.create_time,old.create_time),
        nvl(new.callback_time,old.callback_time),
        nvl(date_format(nvl(new.callback_time,old.callback_time),'yyyy-MM-dd'),'9999-99-99')
    from
    (
        select
            id,
            user_id,
            order_id,
            sku_id,
            province_id,
            trade_no,
            out_trade_no,
            payment_type,
            refund_amount,
            refund_status,
            create_time,
            callback_time
        from ${APP}.dwd_refund_payment
        where dt='9999-99-99'
    )old
    full outer join
    (
        select
            rp.id,
            user_id,
            order_id,
            sku_id,
            province_id,
            trade_no,
            out_trade_no,
            payment_type,
            refund_amount,
            refund_status,
            create_time,
            callback_time
        from
        (
            select
                id,
                out_trade_no,
                order_id,
                sku_id,
                payment_type,
                trade_no,
                refund_amount,
                refund_status,
                create_time,
                callback_time
            from ${APP}.ods_refund_payment
            where dt='$do_date'
        )rp
        left join
        (
            select
                id,
                user_id,
                province_id
            from ${APP}.ods_order_info
            where dt='$do_date'
        )oi
        on rp.order_id=oi.id
    )new
    on old.id=new.id;"
    
    case $1 in
        dwd_order_info )
            hive -e "$dwd_order_info"
            clear_data dwd_order_info
        ;;
        dwd_order_detail )
            hive -e "$dwd_order_detail"
        ;;
        dwd_payment_info )
            hive -e "$dwd_payment_info"
            clear_data dwd_payment_info
        ;;
        dwd_cart_info )
            hive -e "$dwd_cart_info"
        ;;
        dwd_comment_info )
            hive -e "$dwd_comment_info"
        ;;
        dwd_favor_info )
            hive -e "$dwd_favor_info"
        ;;
        dwd_coupon_use )
            hive -e "$dwd_coupon_use"
            clear_data dwd_coupon_use
        ;;
        dwd_order_refund_info )
            hive -e "$dwd_order_refund_info"
        ;;
        dwd_refund_payment )
            hive -e "$dwd_refund_payment"
            clear_data dwd_refund_payment
        ;;
        all )
            hive -e "$dwd_order_info$dwd_order_detail$dwd_payment_info$dwd_cart_info$dwd_comment_info$dwd_favor_info$dwd_coupon_use$dwd_order_refund_info$dwd_refund_payment"
            clear_data dwd_order_info
            clear_data dwd_payment_info
            clear_data dwd_coupon_use
            clear_data dwd_refund_payment
        ;;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
    318
    319
    320
    321
    322
    323
    324
    325
    326
    327
    328
    329
    330
    331
    332
    333
    334
    335
    336
    337
    338
    339
    340
    341
    342
    343
    344
    345
    346
    347
    348
    349
    350
    351
    352
    353
    354
    355
    356
    357
    358
    359
    360
    361
    362
    363
    364
    365
    366
    367
    368
    369
    370
    371
    372
    373
    374
    375
    376
    377
    378
    379
    380
    381
    382
    383
    384
    385
    386
    387
    388
    389
    390
    391
    392
    393
    394
    395
    396
    397
    398
    399
    400
    401
    402
    403
    404
    405
    406
    407
    408
    409
    410
    411
    412
    413
    414
    415
    416
    417
    418
    419
    420
    421
    422
    423
    424
    425
    426
    427
    428
    429
    430
    431
    432
    433
    434
    435
    436
    437
    438
    439
    440
    441
    442
    443
    444
    445
    446
    447
    448
    449
    450
    451
    452
    453
    454
    455
    456
    457
    458
    459
    460
    461
    462
    463
    464
    465
    466
    467
    468
    469
    470
    471
    472
    473
    474
    475
    476
    477
    478
    479
    480
    481
    482
    483
    484
    485
    486
  2. 添加执行权限

  3. 执行脚本

    ods_to_dwd_db.sh all 2020-06-14
    
    1

# 数仓搭建-DWS层

# 访客主题

  1. 建表语句

    DROP TABLE IF EXISTS dws_visitor_action_daycount;
    CREATE EXTERNAL TABLE dws_visitor_action_daycount
    (
        `mid_id` STRING COMMENT '设备id',
        `brand` STRING COMMENT '设备品牌',
        `model` STRING COMMENT '设备型号',
        `is_new` STRING COMMENT '是否首次访问',
        `channel` ARRAY<STRING> COMMENT '渠道',
        `os` ARRAY<STRING> COMMENT '操作系统',
        `area_code` ARRAY<STRING> COMMENT '地区ID',
        `version_code` ARRAY<STRING> COMMENT '应用版本',
        `visit_count` BIGINT COMMENT '访问次数',
        `page_stats` ARRAY<STRUCT<page_id:STRING,page_count:BIGINT,during_time:BIGINT>> COMMENT '页面访问统计'
    ) COMMENT '每日设备行为表'
    PARTITIONED BY(`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dws/dws_visitor_action_daycount'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
  2. 数据装载

    insert overwrite table dws_visitor_action_daycount partition(dt='2020-06-14')
    select
        t1.mid_id,
        t1.brand,
        t1.model,
        t1.is_new,
        t1.channel,
        t1.os,
        t1.area_code,
        t1.version_code,
        t1.visit_count,
        t3.page_stats
    from
    (
        select
            mid_id,
            brand,
            model,
            if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天内,同一设备的is_new字段,可能全部为1,可能全部为0,也可能部分为0,部分为1(卸载重装),故做该处理
            collect_set(channel) channel,
            collect_set(os) os,
            collect_set(area_code) area_code,
            collect_set(version_code) version_code,
            sum(if(last_page_id is null,1,0)) visit_count
        from dwd_page_log
        where dt='2020-06-14'
        and last_page_id is null
        group by mid_id,model,brand
    )t1
    join
    (
        select
            mid_id,
            brand,
            model,
            collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_stats
        from
        (
            select
                mid_id,
                brand,
                model,
                page_id,
                count(*) page_count,
                sum(during_time) during_time
            from dwd_page_log
            where dt='2020-06-14'
            group by mid_id,model,brand,page_id
        )t2
        group by mid_id,model,brand
    )t3
    on t1.mid_id=t3.mid_id
    and t1.brand=t3.brand
    and t1.model=t3.model;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54

# 用户主题

  1. 建表语句

    DROP TABLE IF EXISTS dws_user_action_daycount;
    CREATE EXTERNAL TABLE dws_user_action_daycount
    (
        `user_id` STRING COMMENT '用户id',
        `login_count` BIGINT COMMENT '登录次数',
        `cart_count` BIGINT COMMENT '加入购物车次数',
        `favor_count` BIGINT COMMENT '收藏次数',
        `order_count` BIGINT COMMENT '下单次数',
        `order_activity_count` BIGINT COMMENT '订单参与活动次数',
        `order_activity_reduce_amount` DECIMAL(16,2) COMMENT '订单减免金额(活动)',
        `order_coupon_count` BIGINT COMMENT '订单用券次数',
        `order_coupon_reduce_amount` DECIMAL(16,2) COMMENT '订单减免金额(优惠券)',
        `order_original_amount` DECIMAL(16,2)  COMMENT '订单单原始金额',
        `order_final_amount` DECIMAL(16,2) COMMENT '订单总金额',
        `payment_count` BIGINT COMMENT '支付次数',
        `payment_amount` DECIMAL(16,2) COMMENT '支付金额',
        `refund_order_count` BIGINT COMMENT '退单次数',
        `refund_order_num` BIGINT COMMENT '退单件数',
        `refund_order_amount` DECIMAL(16,2) COMMENT '退单金额',
        `refund_payment_count` BIGINT COMMENT '退款次数',
        `refund_payment_num` BIGINT COMMENT '退款件数',
        `refund_payment_amount` DECIMAL(16,2) COMMENT '退款金额',
        `coupon_get_count` BIGINT COMMENT '优惠券领取次数',
        `coupon_using_count` BIGINT COMMENT '优惠券使用(下单)次数',
        `coupon_used_count` BIGINT COMMENT '优惠券使用(支付)次数',
        `appraise_good_count` BIGINT COMMENT '好评数',
        `appraise_mid_count` BIGINT COMMENT '中评数',
        `appraise_bad_count` BIGINT COMMENT '差评数',
        `appraise_default_count` BIGINT COMMENT '默认评价数',
        `order_detail_stats` array<struct<sku_id:string,sku_num:bigint,order_count:bigint,activity_reduce_amount:decimal(16,2),coupon_reduce_amount:decimal(16,2),original_amount:decimal(16,2),final_amount:decimal(16,2)>> COMMENT '下单明细统计'
    ) COMMENT '每日用户行为'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dws/dws_user_action_daycount/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
  2. 数据装载

    1. 首日装载

      with
      tmp_login as
      (
          select
              dt,
              user_id,
              count(*) login_count
          from dwd_page_log
          where user_id is not null
          and last_page_id is null
          group by dt,user_id
      ),
      tmp_cf as
      (
          select
              dt,
              user_id,
              sum(if(action_id='cart_add',1,0)) cart_count,
              sum(if(action_id='favor_add',1,0)) favor_count
          from dwd_action_log
          where user_id is not null
          and action_id in ('cart_add','favor_add')
          group by dt,user_id
      ),
      tmp_order as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              user_id,
              count(*) order_count,
              sum(if(activity_reduce_amount>0,1,0)) order_activity_count,
              sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count,
              sum(activity_reduce_amount) order_activity_reduce_amount,
              sum(coupon_reduce_amount) order_coupon_reduce_amount,
              sum(original_amount) order_original_amount,
              sum(final_amount) order_final_amount
          from dwd_order_info
          group by date_format(create_time,'yyyy-MM-dd'),user_id
      ),
      tmp_pay as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              user_id,
              count(*) payment_count,
              sum(payment_amount) payment_amount
          from dwd_payment_info
          group by date_format(callback_time,'yyyy-MM-dd'),user_id
      ),
      tmp_ri as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              user_id,
              count(*) refund_order_count,
              sum(refund_num) refund_order_num,
              sum(refund_amount) refund_order_amount
          from dwd_order_refund_info
          group by date_format(create_time,'yyyy-MM-dd'),user_id
      ),
      tmp_rp as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              rp.user_id,
              count(*) refund_payment_count,
              sum(ri.refund_num) refund_payment_num,
              sum(rp.refund_amount) refund_payment_amount
          from
          (
              select
                  user_id,
                  order_id,
                  sku_id,
                  refund_amount,
                  callback_time
              from dwd_refund_payment
          )rp
          left join
          (
              select
                  user_id,
                  order_id,
                  sku_id,
                  refund_num
              from dwd_order_refund_info
          )ri
          on rp.order_id=ri.order_id
          and rp.sku_id=rp.sku_id
          group by date_format(callback_time,'yyyy-MM-dd'),rp.user_id
      ),
      tmp_coupon as
      (
          select
              coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt) dt,
              coalesce(coupon_get.user_id,coupon_using.user_id,coupon_used.user_id) user_id,
              nvl(coupon_get_count,0) coupon_get_count,
              nvl(coupon_using_count,0) coupon_using_count,
              nvl(coupon_used_count,0) coupon_used_count
          from
          (
              select
                  date_format(get_time,'yyyy-MM-dd') dt,
                  user_id,
                  count(*) coupon_get_count
              from dwd_coupon_use
              where get_time is not null
              group by user_id,date_format(get_time,'yyyy-MM-dd')
          )coupon_get
          full outer join
          (
              select
                  date_format(using_time,'yyyy-MM-dd') dt,
                  user_id,
                  count(*) coupon_using_count
              from dwd_coupon_use
              where using_time is not null
              group by user_id,date_format(using_time,'yyyy-MM-dd')
          )coupon_using
          on coupon_get.dt=coupon_using.dt
          and coupon_get.user_id=coupon_using.user_id
          full outer join
          (
              select
                  date_format(used_time,'yyyy-MM-dd') dt,
                  user_id,
                  count(*) coupon_used_count
              from dwd_coupon_use
              where used_time is not null
              group by user_id,date_format(used_time,'yyyy-MM-dd')
          )coupon_used
          on nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dt
          and nvl(coupon_get.user_id,coupon_using.user_id)=coupon_used.user_id
      ),
      tmp_comment as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              user_id,
              sum(if(appraise='1201',1,0)) appraise_good_count,
              sum(if(appraise='1202',1,0)) appraise_mid_count,
              sum(if(appraise='1203',1,0)) appraise_bad_count,
              sum(if(appraise='1204',1,0)) appraise_default_count
          from dwd_comment_info
          group by date_format(create_time,'yyyy-MM-dd'),user_id
      ),
      tmp_od as
      (
          select
              dt,
              user_id,
              collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_stats
          from
          (
              select
                  date_format(create_time,'yyyy-MM-dd') dt,
                  user_id,
                  sku_id,
                  sum(sku_num) sku_num,
                  count(*) order_count,
                  cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount,
                  cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount,
                  cast(sum(original_amount) as decimal(16,2)) original_amount,
                  cast(sum(split_final_amount) as decimal(16,2)) final_amount
              from dwd_order_detail
              group by date_format(create_time,'yyyy-MM-dd'),user_id,sku_id
          )t1
          group by dt,user_id
      )
      insert overwrite table dws_user_action_daycount partition(dt)
      select
          coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id),
          nvl(login_count,0),
          nvl(cart_count,0),
          nvl(favor_count,0),
          nvl(order_count,0),
          nvl(order_activity_count,0),
          nvl(order_activity_reduce_amount,0),
          nvl(order_coupon_count,0),
          nvl(order_coupon_reduce_amount,0),
          nvl(order_original_amount,0),
          nvl(order_final_amount,0),
          nvl(payment_count,0),
          nvl(payment_amount,0),
          nvl(refund_order_count,0),
          nvl(refund_order_num,0),
          nvl(refund_order_amount,0),
          nvl(refund_payment_count,0),
          nvl(refund_payment_num,0),
          nvl(refund_payment_amount,0),
          nvl(coupon_get_count,0),
          nvl(coupon_using_count,0),
          nvl(coupon_used_count,0),
          nvl(appraise_good_count,0),
          nvl(appraise_mid_count,0),
          nvl(appraise_bad_count,0),
          nvl(appraise_default_count,0),
          order_detail_stats,
          coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt,tmp_od.dt)
      from tmp_login
      full outer join tmp_cf
      on tmp_login.user_id=tmp_cf.user_id
      and tmp_login.dt=tmp_cf.dt
      full outer join tmp_order
      on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id
      and coalesce(tmp_login.dt,tmp_cf.dt)=tmp_order.dt
      full outer join tmp_pay
      on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id
      and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt)=tmp_pay.dt
      full outer join tmp_ri
      on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id
      and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt)=tmp_ri.dt
      full outer join tmp_rp
      on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id
      and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt)=tmp_rp.dt
      full outer join tmp_comment
      on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id
      and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt)=tmp_comment.dt
      full outer join tmp_coupon
      on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id
      and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt)=tmp_coupon.dt
      full outer join tmp_od
      on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id
      and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt)=tmp_od.dt;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133
      134
      135
      136
      137
      138
      139
      140
      141
      142
      143
      144
      145
      146
      147
      148
      149
      150
      151
      152
      153
      154
      155
      156
      157
      158
      159
      160
      161
      162
      163
      164
      165
      166
      167
      168
      169
      170
      171
      172
      173
      174
      175
      176
      177
      178
      179
      180
      181
      182
      183
      184
      185
      186
      187
      188
      189
      190
      191
      192
      193
      194
      195
      196
      197
      198
      199
      200
      201
      202
      203
      204
      205
      206
      207
      208
      209
      210
      211
      212
      213
      214
      215
      216
      217
      218
      219
      220
      221
      222
      223
      224
    2. 每日装载

      with
      tmp_login as
      (
          select
              user_id,
              count(*) login_count
          from dwd_page_log
          where dt='2020-06-15'
          and user_id is not null
          and last_page_id is null
          group by user_id
      ),
      tmp_cf as
      (
          select
              user_id,
              sum(if(action_id='cart_add',1,0)) cart_count,
              sum(if(action_id='favor_add',1,0)) favor_count
          from dwd_action_log
          where dt='2020-06-15'
          and user_id is not null
          and action_id in ('cart_add','favor_add')
          group by user_id
      ),
      tmp_order as
      (
          select
              user_id,
              count(*) order_count,
              sum(if(activity_reduce_amount>0,1,0)) order_activity_count,
              sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count,
              sum(activity_reduce_amount) order_activity_reduce_amount,
              sum(coupon_reduce_amount) order_coupon_reduce_amount,
              sum(original_amount) order_original_amount,
              sum(final_amount) order_final_amount
          from dwd_order_info
          where (dt='2020-06-15'
          or dt='9999-99-99')
          and date_format(create_time,'yyyy-MM-dd')='2020-06-15'
          group by user_id
      ),
      tmp_pay as
      (
          select
              user_id,
              count(*) payment_count,
              sum(payment_amount) payment_amount
          from dwd_payment_info
          where dt='2020-06-15'
          group by user_id
      ),
      tmp_ri as
      (
          select
              user_id,
              count(*) refund_order_count,
              sum(refund_num) refund_order_num,
              sum(refund_amount) refund_order_amount
          from dwd_order_refund_info
          where dt='2020-06-15'
          group by user_id
      ),
      tmp_rp as
      (
          select
              rp.user_id,
              count(*) refund_payment_count,
              sum(ri.refund_num) refund_payment_num,
              sum(rp.refund_amount) refund_payment_amount
          from
          (
              select
                  user_id,
                  order_id,
                  sku_id,
                  refund_amount
              from dwd_refund_payment
              where dt='2020-06-15'
          )rp
          left join
          (
              select
                  user_id,
                  order_id,
                  sku_id,
                  refund_num
              from dwd_order_refund_info
              where dt>=date_add('2020-06-15',-15)
          )ri
          on rp.order_id=ri.order_id
          and rp.sku_id=rp.sku_id
          group by rp.user_id
      ),
      tmp_coupon as
      (
          select
              user_id,
              sum(if(date_format(get_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_get_count,
              sum(if(date_format(using_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_using_count,
              sum(if(date_format(used_time,'yyyy-MM-dd')='2020-06-15',1,0)) coupon_used_count
          from dwd_coupon_use
          where (dt='2020-06-15' or dt='9999-99-99')
          and (date_format(get_time, 'yyyy-MM-dd') = '2020-06-15'
          or date_format(using_time,'yyyy-MM-dd')='2020-06-15'
          or date_format(used_time,'yyyy-MM-dd')='2020-06-15')
          group by user_id
      ),
      tmp_comment as
      (
          select
              user_id,
              sum(if(appraise='1201',1,0)) appraise_good_count,
              sum(if(appraise='1202',1,0)) appraise_mid_count,
              sum(if(appraise='1203',1,0)) appraise_bad_count,
              sum(if(appraise='1204',1,0)) appraise_default_count
          from dwd_comment_info
          where dt='2020-06-15'
          group by user_id
      ),
      tmp_od as
      (
          select
              user_id,
              collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_stats
          from
          (
              select
                  user_id,
                  sku_id,
                  sum(sku_num) sku_num,
                  count(*) order_count,
                  cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount,
                  cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount,
                  cast(sum(original_amount) as decimal(16,2)) original_amount,
                  cast(sum(split_final_amount) as decimal(16,2)) final_amount
              from dwd_order_detail
              where dt='2020-06-15'
              group by user_id,sku_id
          )t1
          group by user_id
      )
      insert overwrite table dws_user_action_daycount partition(dt='2020-06-15')
      select
          coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id),
          nvl(login_count,0),
          nvl(cart_count,0),
          nvl(favor_count,0),
          nvl(order_count,0),
          nvl(order_activity_count,0),
          nvl(order_activity_reduce_amount,0),
          nvl(order_coupon_count,0),
          nvl(order_coupon_reduce_amount,0),
          nvl(order_original_amount,0),
          nvl(order_final_amount,0),
          nvl(payment_count,0),
          nvl(payment_amount,0),
          nvl(refund_order_count,0),
          nvl(refund_order_num,0),
          nvl(refund_order_amount,0),
          nvl(refund_payment_count,0),
          nvl(refund_payment_num,0),
          nvl(refund_payment_amount,0),
          nvl(coupon_get_count,0),
          nvl(coupon_using_count,0),
          nvl(coupon_used_count,0),
          nvl(appraise_good_count,0),
          nvl(appraise_mid_count,0),
          nvl(appraise_bad_count,0),
          nvl(appraise_default_count,0),
          order_detail_stats
      from tmp_login
      full outer join tmp_cf on tmp_login.user_id=tmp_cf.user_id
      full outer join tmp_order on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id
      full outer join tmp_pay on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id
      full outer join tmp_ri on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id
      full outer join tmp_rp on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id
      full outer join tmp_comment on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id
      full outer join tmp_coupon on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id
      full outer join tmp_od on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133
      134
      135
      136
      137
      138
      139
      140
      141
      142
      143
      144
      145
      146
      147
      148
      149
      150
      151
      152
      153
      154
      155
      156
      157
      158
      159
      160
      161
      162
      163
      164
      165
      166
      167
      168
      169
      170
      171
      172
      173
      174
      175
      176
      177
      178
      179

# 商品主题

  1. 建表语句

    DROP TABLE IF EXISTS dws_sku_action_daycount;
    CREATE EXTERNAL TABLE dws_sku_action_daycount
    (
        `sku_id` STRING COMMENT 'sku_id',
        `order_count` BIGINT COMMENT '被下单次数',
        `order_num` BIGINT COMMENT '被下单件数',
        `order_activity_count` BIGINT COMMENT '参与活动被下单次数',
        `order_coupon_count` BIGINT COMMENT '使用优惠券被下单次数',
        `order_activity_reduce_amount` DECIMAL(16,2) COMMENT '优惠金额(活动)',
        `order_coupon_reduce_amount` DECIMAL(16,2) COMMENT '优惠金额(优惠券)',
        `order_original_amount` DECIMAL(16,2) COMMENT '被下单原价金额',
        `order_final_amount` DECIMAL(16,2) COMMENT '被下单最终金额',
        `payment_count` BIGINT COMMENT '被支付次数',
        `payment_num` BIGINT COMMENT '被支付件数',
        `payment_amount` DECIMAL(16,2) COMMENT '被支付金额',
        `refund_order_count` BIGINT  COMMENT '被退单次数',
        `refund_order_num` BIGINT COMMENT '被退单件数',
        `refund_order_amount` DECIMAL(16,2) COMMENT '被退单金额',
        `refund_payment_count` BIGINT  COMMENT '被退款次数',
        `refund_payment_num` BIGINT COMMENT '被退款件数',
        `refund_payment_amount` DECIMAL(16,2) COMMENT '被退款金额',
        `cart_count` BIGINT COMMENT '被加入购物车次数',
        `favor_count` BIGINT COMMENT '被收藏次数',
        `appraise_good_count` BIGINT COMMENT '好评数',
        `appraise_mid_count` BIGINT COMMENT '中评数',
        `appraise_bad_count` BIGINT COMMENT '差评数',
        `appraise_default_count` BIGINT COMMENT '默认评价数'
    ) COMMENT '每日商品行为'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dws/dws_sku_action_daycount/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
  2. 数据装载

    1. 首日装载

      with
      tmp_order as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              sku_id,
              count(*) order_count,
              sum(sku_num) order_num,
              sum(if(split_activity_amount>0,1,0)) order_activity_count,
              sum(if(split_coupon_amount>0,1,0)) order_coupon_count,
              sum(split_activity_amount) order_activity_reduce_amount,
              sum(split_coupon_amount) order_coupon_reduce_amount,
              sum(original_amount) order_original_amount,
              sum(split_final_amount) order_final_amount
          from dwd_order_detail
          group by date_format(create_time,'yyyy-MM-dd'),sku_id
      ),
      tmp_pay as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              sku_id,
              count(*) payment_count,
              sum(sku_num) payment_num,
              sum(split_final_amount) payment_amount
          from dwd_order_detail od
          join
          (
              select
                  order_id,
                  callback_time
              from dwd_payment_info
              where callback_time is not null
          )pi on pi.order_id=od.order_id
          group by date_format(callback_time,'yyyy-MM-dd'),sku_id
      ),
      tmp_ri as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              sku_id,
              count(*) refund_order_count,
              sum(refund_num) refund_order_num,
              sum(refund_amount) refund_order_amount
          from dwd_order_refund_info
          group by date_format(create_time,'yyyy-MM-dd'),sku_id
      ),
      tmp_rp as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              rp.sku_id,
              count(*) refund_payment_count,
              sum(ri.refund_num) refund_payment_num,
              sum(refund_amount) refund_payment_amount
          from
          (
              select
                  order_id,
                  sku_id,
                  refund_amount,
                  callback_time
              from dwd_refund_payment
          )rp
          left join
          (
              select
                  order_id,
                  sku_id,
                  refund_num
              from dwd_order_refund_info
          )ri
          on rp.order_id=ri.order_id
          and rp.sku_id=ri.sku_id
          group by date_format(callback_time,'yyyy-MM-dd'),rp.sku_id
      ),
      tmp_cf as
      (
          select
              dt,
              item sku_id,
              sum(if(action_id='cart_add',1,0)) cart_count,
              sum(if(action_id='favor_add',1,0)) favor_count
          from dwd_action_log
          where action_id in ('cart_add','favor_add')
          group by dt,item
      ),
      tmp_comment as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              sku_id,
              sum(if(appraise='1201',1,0)) appraise_good_count,
              sum(if(appraise='1202',1,0)) appraise_mid_count,
              sum(if(appraise='1203',1,0)) appraise_bad_count,
              sum(if(appraise='1204',1,0)) appraise_default_count
          from dwd_comment_info
          group by date_format(create_time,'yyyy-MM-dd'),sku_id
      )
      insert overwrite table dws_sku_action_daycount partition(dt)
      select
          sku_id,
          sum(order_count),
          sum(order_num),
          sum(order_activity_count),
          sum(order_coupon_count),
          sum(order_activity_reduce_amount),
          sum(order_coupon_reduce_amount),
          sum(order_original_amount),
          sum(order_final_amount),
          sum(payment_count),
          sum(payment_num),
          sum(payment_amount),
          sum(refund_order_count),
          sum(refund_order_num),
          sum(refund_order_amount),
          sum(refund_payment_count),
          sum(refund_payment_num),
          sum(refund_payment_amount),
          sum(cart_count),
          sum(favor_count),
          sum(appraise_good_count),
          sum(appraise_mid_count),
          sum(appraise_bad_count),
          sum(appraise_default_count),
          dt
      from
      (
          select
              dt,
              sku_id,
              order_count,
              order_num,
              order_activity_count,
              order_coupon_count,
              order_activity_reduce_amount,
              order_coupon_reduce_amount,
              order_original_amount,
              order_final_amount,
              0 payment_count,
              0 payment_num,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_num,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_num,
              0 refund_payment_amount,
              0 cart_count,
              0 favor_count,
              0 appraise_good_count,
              0 appraise_mid_count,
              0 appraise_bad_count,
              0 appraise_default_count
          from tmp_order
          union all
          select
              dt,
              sku_id,
              0 order_count,
              0 order_num,
              0 order_activity_count,
              0 order_coupon_count,
              0 order_activity_reduce_amount,
              0 order_coupon_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              payment_count,
              payment_num,
              payment_amount,
              0 refund_order_count,
              0 refund_order_num,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_num,
              0 refund_payment_amount,
              0 cart_count,
              0 favor_count,
              0 appraise_good_count,
              0 appraise_mid_count,
              0 appraise_bad_count,
              0 appraise_default_count
          from tmp_pay
          union all
          select
              dt,
              sku_id,
              0 order_count,
              0 order_num,
              0 order_activity_count,
              0 order_coupon_count,
              0 order_activity_reduce_amount,
              0 order_coupon_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_num,
              0 payment_amount,
              refund_order_count,
              refund_order_num,
              refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_num,
              0 refund_payment_amount,
              0 cart_count,
              0 favor_count,
              0 appraise_good_count,
              0 appraise_mid_count,
              0 appraise_bad_count,
              0 appraise_default_count
          from tmp_ri
          union all
          select
              dt,
              sku_id,
              0 order_count,
              0 order_num,
              0 order_activity_count,
              0 order_coupon_count,
              0 order_activity_reduce_amount,
              0 order_coupon_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_num,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_num,
              0 refund_order_amount,
              refund_payment_count,
              refund_payment_num,
              refund_payment_amount,
              0 cart_count,
              0 favor_count,
              0 appraise_good_count,
              0 appraise_mid_count,
              0 appraise_bad_count,
              0 appraise_default_count
          from tmp_rp
          union all
          select
              dt,
              sku_id,
              0 order_count,
              0 order_num,
              0 order_activity_count,
              0 order_coupon_count,
              0 order_activity_reduce_amount,
              0 order_coupon_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_num,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_num,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_num,
              0 refund_payment_amount,
              cart_count,
              favor_count,
              0 appraise_good_count,
              0 appraise_mid_count,
              0 appraise_bad_count,
              0 appraise_default_count
          from tmp_cf
          union all
          select
              dt,
              sku_id,
              0 order_count,
              0 order_num,
              0 order_activity_count,
              0 order_coupon_count,
              0 order_activity_reduce_amount,
              0 order_coupon_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_num,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_num,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_num,
              0 refund_payment_amount,
              0 cart_count,
              0 favor_count,
              appraise_good_count,
              appraise_mid_count,
              appraise_bad_count,
              appraise_default_count
          from tmp_comment
      )t1
      group by dt,sku_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133
      134
      135
      136
      137
      138
      139
      140
      141
      142
      143
      144
      145
      146
      147
      148
      149
      150
      151
      152
      153
      154
      155
      156
      157
      158
      159
      160
      161
      162
      163
      164
      165
      166
      167
      168
      169
      170
      171
      172
      173
      174
      175
      176
      177
      178
      179
      180
      181
      182
      183
      184
      185
      186
      187
      188
      189
      190
      191
      192
      193
      194
      195
      196
      197
      198
      199
      200
      201
      202
      203
      204
      205
      206
      207
      208
      209
      210
      211
      212
      213
      214
      215
      216
      217
      218
      219
      220
      221
      222
      223
      224
      225
      226
      227
      228
      229
      230
      231
      232
      233
      234
      235
      236
      237
      238
      239
      240
      241
      242
      243
      244
      245
      246
      247
      248
      249
      250
      251
      252
      253
      254
      255
      256
      257
      258
      259
      260
      261
      262
      263
      264
      265
      266
      267
      268
      269
      270
      271
      272
      273
      274
      275
      276
      277
      278
      279
      280
      281
      282
      283
      284
      285
      286
      287
      288
      289
      290
      291
      292
      293
      294
      295
      296
      297
    2. 每日装载

      with
      tmp_order as
      (
          select
              sku_id,
              count(*) order_count,
              sum(sku_num) order_num,
              sum(if(split_activity_amount>0,1,0)) order_activity_count,
              sum(if(split_coupon_amount>0,1,0)) order_coupon_count,
              sum(split_activity_amount) order_activity_reduce_amount,
              sum(split_coupon_amount) order_coupon_reduce_amount,
              sum(original_amount) order_original_amount,
              sum(split_final_amount) order_final_amount
          from dwd_order_detail
          where dt='2020-06-15'
          group by sku_id
      ),
      tmp_pay as
      (
          select
              sku_id,
              count(*) payment_count,
              sum(sku_num) payment_num,
              sum(split_final_amount) payment_amount
          from dwd_order_detail
          where (dt='2020-06-15'
          or dt=date_add('2020-06-15',-1))
          and order_id in
          (
              select order_id from dwd_payment_info where dt='2020-06-15'
          )
          group by sku_id
      ),
      tmp_ri as
      (
          select
              sku_id,
              count(*) refund_order_count,
              sum(refund_num) refund_order_num,
              sum(refund_amount) refund_order_amount
          from dwd_order_refund_info
          where dt='2020-06-15'
          group by sku_id
      ),
      tmp_rp as
      (
          select
              rp.sku_id,
              count(*) refund_payment_count,
              sum(ri.refund_num) refund_payment_num,
              sum(refund_amount) refund_payment_amount
          from
          (
              select
                  order_id,
                  sku_id,
                  refund_amount
              from dwd_refund_payment
              where dt='2020-06-15'
          )rp
          left join
          (
              select
                  order_id,
                  sku_id,
                  refund_num
              from dwd_order_refund_info
              where dt>=date_add('2020-06-15',-15)
          )ri
          on rp.order_id=ri.order_id
          and rp.sku_id=ri.sku_id
          group by rp.sku_id
      ),
      tmp_cf as
      (
          select
              item sku_id,
              sum(if(action_id='cart_add',1,0)) cart_count,
              sum(if(action_id='favor_add',1,0)) favor_count
          from dwd_action_log
          where dt='2020-06-15'
          and action_id in ('cart_add','favor_add')
          group by item
      ),
      tmp_comment as
      (
          select
              sku_id,
              sum(if(appraise='1201',1,0)) appraise_good_count,
              sum(if(appraise='1202',1,0)) appraise_mid_count,
              sum(if(appraise='1203',1,0)) appraise_bad_count,
              sum(if(appraise='1204',1,0)) appraise_default_count
          from dwd_comment_info
          where dt='2020-06-15'
          group by sku_id
      )
      insert overwrite table dws_sku_action_daycount partition(dt='2020-06-15')
      select
          sku_id,
          sum(order_count),
          sum(order_num),
          sum(order_activity_count),
          sum(order_coupon_count),
          sum(order_activity_reduce_amount),
          sum(order_coupon_reduce_amount),
          sum(order_original_amount),
          sum(order_final_amount),
          sum(payment_count),
          sum(payment_num),
          sum(payment_amount),
          sum(refund_order_count),
          sum(refund_order_num),
          sum(refund_order_amount),
          sum(refund_payment_count),
          sum(refund_payment_num),
          sum(refund_payment_amount),
          sum(cart_count),
          sum(favor_count),
          sum(appraise_good_count),
          sum(appraise_mid_count),
          sum(appraise_bad_count),
          sum(appraise_default_count)
      from
      (
          select
              sku_id,
              order_count,
              order_num,
              order_activity_count,
              order_coupon_count,
              order_activity_reduce_amount,
              order_coupon_reduce_amount,
              order_original_amount,
              order_final_amount,
              0 payment_count,
              0 payment_num,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_num,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_num,
              0 refund_payment_amount,
              0 cart_count,
              0 favor_count,
              0 appraise_good_count,
              0 appraise_mid_count,
              0 appraise_bad_count,
              0 appraise_default_count
          from tmp_order
          union all
          select
              sku_id,
              0 order_count,
              0 order_num,
              0 order_activity_count,
              0 order_coupon_count,
              0 order_activity_reduce_amount,
              0 order_coupon_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              payment_count,
              payment_num,
              payment_amount,
              0 refund_order_count,
              0 refund_order_num,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_num,
              0 refund_payment_amount,
              0 cart_count,
              0 favor_count,
              0 appraise_good_count,
              0 appraise_mid_count,
              0 appraise_bad_count,
              0 appraise_default_count
          from tmp_pay
          union all
          select
              sku_id,
              0 order_count,
              0 order_num,
              0 order_activity_count,
              0 order_coupon_count,
              0 order_activity_reduce_amount,
              0 order_coupon_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_num,
              0 payment_amount,
              refund_order_count,
              refund_order_num,
              refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_num,
              0 refund_payment_amount,
              0 cart_count,
              0 favor_count,
              0 appraise_good_count,
              0 appraise_mid_count,
              0 appraise_bad_count,
              0 appraise_default_count
          from tmp_ri
          union all
          select
              sku_id,
              0 order_count,
              0 order_num,
              0 order_activity_count,
              0 order_coupon_count,
              0 order_activity_reduce_amount,
              0 order_coupon_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_num,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_num,
              0 refund_order_amount,
              refund_payment_count,
              refund_payment_num,
              refund_payment_amount,
              0 cart_count,
              0 favor_count,
              0 appraise_good_count,
              0 appraise_mid_count,
              0 appraise_bad_count,
              0 appraise_default_count
          from tmp_rp
          union all
          select
              sku_id,
              0 order_count,
              0 order_num,
              0 order_activity_count,
              0 order_coupon_count,
              0 order_activity_reduce_amount,
              0 order_coupon_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_num,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_num,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_num,
              0 refund_payment_amount,
              cart_count,
              favor_count,
              0 appraise_good_count,
              0 appraise_mid_count,
              0 appraise_bad_count,
              0 appraise_default_count
          from tmp_cf
          union all
          select
              sku_id,
              0 order_count,
              0 order_num,
              0 order_activity_count,
              0 order_coupon_count,
              0 order_activity_reduce_amount,
              0 order_coupon_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_num,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_num,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_num,
              0 refund_payment_amount,
              0 cart_count,
              0 favor_count,
              appraise_good_count,
              appraise_mid_count,
              appraise_bad_count,
              appraise_default_count
          from tmp_comment
      )t1
      group by sku_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133
      134
      135
      136
      137
      138
      139
      140
      141
      142
      143
      144
      145
      146
      147
      148
      149
      150
      151
      152
      153
      154
      155
      156
      157
      158
      159
      160
      161
      162
      163
      164
      165
      166
      167
      168
      169
      170
      171
      172
      173
      174
      175
      176
      177
      178
      179
      180
      181
      182
      183
      184
      185
      186
      187
      188
      189
      190
      191
      192
      193
      194
      195
      196
      197
      198
      199
      200
      201
      202
      203
      204
      205
      206
      207
      208
      209
      210
      211
      212
      213
      214
      215
      216
      217
      218
      219
      220
      221
      222
      223
      224
      225
      226
      227
      228
      229
      230
      231
      232
      233
      234
      235
      236
      237
      238
      239
      240
      241
      242
      243
      244
      245
      246
      247
      248
      249
      250
      251
      252
      253
      254
      255
      256
      257
      258
      259
      260
      261
      262
      263
      264
      265
      266
      267
      268
      269
      270
      271
      272
      273
      274
      275
      276
      277
      278
      279
      280
      281
      282
      283
      284
      285
      286
      287

# 优惠券主题

  1. 建表语句

    DROP TABLE IF EXISTS dws_coupon_info_daycount;
    CREATE EXTERNAL TABLE dws_coupon_info_daycount(
        `coupon_id` STRING COMMENT '优惠券ID',
        `get_count` BIGINT COMMENT '被领取次数',
        `order_count` BIGINT COMMENT '被使用(下单)次数', 
        `order_reduce_amount` DECIMAL(16,2) COMMENT '用券下单优惠金额',
        `order_original_amount` DECIMAL(16,2) COMMENT '用券订单原价金额',
        `order_final_amount` DECIMAL(16,2) COMMENT '用券下单最终金额',
        `payment_count` BIGINT COMMENT '被使用(支付)次数',
        `payment_reduce_amount` DECIMAL(16,2) COMMENT '用券支付优惠金额',
        `payment_amount` DECIMAL(16,2) COMMENT '用券支付总金额',
        `expire_count` BIGINT COMMENT '过期次数'
    ) COMMENT '每日活动统计'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dws/dws_coupon_info_daycount/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
  2. 数据装载

    1. 首日装载

      with
      tmp_cu as
      (
          select
              coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt,coupon_exprie.dt) dt,
              coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id,coupon_exprie.coupon_id) coupon_id,
              nvl(get_count,0) get_count,
              nvl(order_count,0) order_count,
              nvl(payment_count,0) payment_count,
              nvl(expire_count,0) expire_count
          from
          (
              select
                  date_format(get_time,'yyyy-MM-dd') dt,
                  coupon_id,
                  count(*) get_count
              from dwd_coupon_use
              group by date_format(get_time,'yyyy-MM-dd'),coupon_id
          )coupon_get
          full outer join
          (
              select
                  date_format(using_time,'yyyy-MM-dd') dt,
                  coupon_id,
                  count(*) order_count
              from dwd_coupon_use
              where using_time is not null
              group by date_format(using_time,'yyyy-MM-dd'),coupon_id
          )coupon_using
          on coupon_get.dt=coupon_using.dt
          and coupon_get.coupon_id=coupon_using.coupon_id
          full outer join
          (
              select
                  date_format(used_time,'yyyy-MM-dd') dt,
                  coupon_id,
                  count(*) payment_count
              from dwd_coupon_use
              where used_time is not null
              group by date_format(used_time,'yyyy-MM-dd'),coupon_id
          )coupon_used
          on nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dt
          and nvl(coupon_get.coupon_id,coupon_using.coupon_id)=coupon_used.coupon_id
          full outer join
          (
              select
                  date_format(expire_time,'yyyy-MM-dd') dt,
                  coupon_id,
                  count(*) expire_count
              from dwd_coupon_use
              where expire_time is not null
              group by date_format(expire_time,'yyyy-MM-dd'),coupon_id
          )coupon_exprie
          on coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt)=coupon_exprie.dt
          and coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id)=coupon_exprie.coupon_id
      ),
      tmp_order as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              coupon_id,
              sum(split_coupon_amount) order_reduce_amount,
              sum(original_amount) order_original_amount,
              sum(split_final_amount) order_final_amount
          from dwd_order_detail
          where coupon_id is not null
          group by date_format(create_time,'yyyy-MM-dd'),coupon_id
      ),
      tmp_pay as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              coupon_id,
              sum(split_coupon_amount) payment_reduce_amount,
              sum(split_final_amount) payment_amount
          from
          (
              select
                  order_id,
                  coupon_id,
                  split_coupon_amount,
                  split_final_amount
              from dwd_order_detail
              where coupon_id is not null
          )od
          join
          (
              select
                  order_id,
                  callback_time
              from dwd_payment_info
          )pi
          on od.order_id=pi.order_id
          group by date_format(callback_time,'yyyy-MM-dd'),coupon_id
      )
      insert overwrite table dws_coupon_info_daycount partition(dt)
      select
          coupon_id,
          sum(get_count),
          sum(order_count),
          sum(order_reduce_amount),
          sum(order_original_amount),
          sum(order_final_amount),
          sum(payment_count),
          sum(payment_reduce_amount),
          sum(payment_amount),
          sum(expire_count),
          dt
      from
      (
          select
              dt,
              coupon_id,
              get_count,
              order_count,
              0 order_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              payment_count,
              0 payment_reduce_amount,
              0 payment_amount,
              expire_count
          from tmp_cu
          union all
          select
              dt,
              coupon_id,
              0 get_count,
              0 order_count,
              order_reduce_amount,
              order_original_amount,
              order_final_amount,
              0 payment_count,
              0 payment_reduce_amount,
              0 payment_amount,
              0 expire_count
          from tmp_order
          union all
          select
              dt,
              coupon_id,
              0 get_count,
              0 order_count,
              0 order_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              payment_reduce_amount,
              payment_amount,
              0 expire_count
          from tmp_pay
      )t1
      group by dt,coupon_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133
      134
      135
      136
      137
      138
      139
      140
      141
      142
      143
      144
      145
      146
      147
      148
      149
      150
      151
      152
      153
    2. 每日装载

      with
      tmp_cu as
      (
          select
              coupon_id,
              sum(if(date_format(get_time,'yyyy-MM-dd')='2020-06-15',1,0)) get_count,
              sum(if(date_format(using_time,'yyyy-MM-dd')='2020-06-15',1,0)) order_count,
              sum(if(date_format(used_time,'yyyy-MM-dd')='2020-06-15',1,0)) payment_count,
              sum(if(date_format(expire_time,'yyyy-MM-dd')='2020-06-15',1,0)) expire_count
          from dwd_coupon_use
          where dt='9999-99-99'
          or dt='2020-06-15'
          group by coupon_id
      ),
      tmp_order as
      (
          select
              coupon_id,
              sum(split_coupon_amount) order_reduce_amount,
              sum(original_amount) order_original_amount,
              sum(split_final_amount) order_final_amount
          from dwd_order_detail
          where dt='2020-06-15'
          and coupon_id is not null
          group by coupon_id
      ),
      tmp_pay as
      (
          select
              coupon_id,
              sum(split_coupon_amount) payment_reduce_amount,
              sum(split_final_amount) payment_amount
          from dwd_order_detail
          where (dt='2020-06-15'
          or dt=date_add('2020-06-15',-1))
          and coupon_id is not null
          and order_id in
          (
              select order_id from dwd_payment_info where dt='2020-06-15'
          )
          group by coupon_id
      )
      insert overwrite table dws_coupon_info_daycount partition(dt='2020-06-15')
      select
          coupon_id,
          sum(get_count),
          sum(order_count),
          sum(order_reduce_amount),
          sum(order_original_amount),
          sum(order_final_amount),
          sum(payment_count),
          sum(payment_reduce_amount),
          sum(payment_amount),
          sum(expire_count)
      from
      (
          select
              coupon_id,
              get_count,
              order_count,
              0 order_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              payment_count,
              0 payment_reduce_amount,
              0 payment_amount,
              expire_count
          from tmp_cu
          union all
          select
              coupon_id,
              0 get_count,
              0 order_count,
              order_reduce_amount,
              order_original_amount,
              order_final_amount,
              0 payment_count,
              0 payment_reduce_amount,
              0 payment_amount,
              0 expire_count
          from tmp_order
          union all
          select
              coupon_id,
              0 get_count,
              0 order_count,
              0 order_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              payment_reduce_amount,
              payment_amount,
              0 expire_count
          from tmp_pay
      )t1
      group by coupon_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96

# 活动主题

  1. 建表语句

    DROP TABLE IF EXISTS dws_activity_info_daycount;
    CREATE EXTERNAL TABLE dws_activity_info_daycount(
        `activity_rule_id` STRING COMMENT '活动规则ID',
        `activity_id` STRING COMMENT '活动ID',
        `order_count` BIGINT COMMENT '参与某活动某规则下单次数',    `order_reduce_amount` DECIMAL(16,2) COMMENT '参与某活动某规则下单减免金额',
        `order_original_amount` DECIMAL(16,2) COMMENT '参与某活动某规则下单原始金额',
        `order_final_amount` DECIMAL(16,2) COMMENT '参与某活动某规则下单最终金额',
        `payment_count` BIGINT COMMENT '参与某活动某规则支付次数',
        `payment_reduce_amount` DECIMAL(16,2) COMMENT '参与某活动某规则支付减免金额',
        `payment_amount` DECIMAL(16,2) COMMENT '参与某活动某规则支付金额'
    ) COMMENT '每日活动统计'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dws/dws_activity_info_daycount/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
  2. 数据装载

    1. 首日装载

      with
      tmp_order as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              activity_rule_id,
              activity_id,
              count(*) order_count,
              sum(split_activity_amount) order_reduce_amount,
              sum(original_amount) order_original_amount,
              sum(split_final_amount) order_final_amount
          from dwd_order_detail
          where activity_id is not null
          group by date_format(create_time,'yyyy-MM-dd'),activity_rule_id,activity_id
      ),
      tmp_pay as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              activity_rule_id,
              activity_id,
              count(*) payment_count,
              sum(split_activity_amount) payment_reduce_amount,
              sum(split_final_amount) payment_amount
          from
          (
              select
                  activity_rule_id,
                  activity_id,
                  order_id,
                  split_activity_amount,
                  split_final_amount
              from dwd_order_detail
              where activity_id is not null
          )od
          join
          (
              select
                  order_id,
                  callback_time
              from dwd_payment_info
          )pi
          on od.order_id=pi.order_id
          group by date_format(callback_time,'yyyy-MM-dd'),activity_rule_id,activity_id
      )
      insert overwrite table dws_activity_info_daycount partition(dt)
      select
          activity_rule_id,
          activity_id,
          sum(order_count),
          sum(order_reduce_amount),
          sum(order_original_amount),
          sum(order_final_amount),
          sum(payment_count),
          sum(payment_reduce_amount),
          sum(payment_amount),
          dt
      from
      (
          select
              dt,
              activity_rule_id,
              activity_id,
              order_count,
              order_reduce_amount,
              order_original_amount,
              order_final_amount,
              0 payment_count,
              0 payment_reduce_amount,
              0 payment_amount
          from tmp_order
          union all
          select
              dt,
              activity_rule_id,
              activity_id,
              0 order_count,
              0 order_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              payment_count,
              payment_reduce_amount,
              payment_amount
          from tmp_pay
      )t1
      group by dt,activity_rule_id,activity_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
    2. 每日装载

      with
      tmp_order as
      (
          select
              activity_rule_id,
              activity_id,
              count(*) order_count,
              sum(split_activity_amount) order_reduce_amount,
              sum(original_amount) order_original_amount,
              sum(split_final_amount) order_final_amount
          from dwd_order_detail
          where dt='2020-06-15'
          and activity_id is not null
          group by activity_rule_id,activity_id
      ),
      tmp_pay as
      (
          select
              activity_rule_id,
              activity_id,
              count(*) payment_count,
              sum(split_activity_amount) payment_reduce_amount,
              sum(split_final_amount) payment_amount
          from dwd_order_detail
          where (dt='2020-06-15'
          or dt=date_add('2020-06-15',-1))
          and activity_id is not null
          and order_id in
          (
              select order_id from dwd_payment_info where dt='2020-06-15'
          )
          group by activity_rule_id,activity_id
      )
      insert overwrite table dws_activity_info_daycount partition(dt='2020-06-15')
      select
          activity_rule_id,
          activity_id,
          sum(order_count),
          sum(order_reduce_amount),
          sum(order_original_amount),
          sum(order_final_amount),
          sum(payment_count),
          sum(payment_reduce_amount),
          sum(payment_amount)
      from
      (
          select
              activity_rule_id,
              activity_id,
              order_count,
              order_reduce_amount,
              order_original_amount,
              order_final_amount,
              0 payment_count,
              0 payment_reduce_amount,
              0 payment_amount
          from tmp_order
          union all
          select
              activity_rule_id,
              activity_id,
              0 order_count,
              0 order_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              payment_count,
              payment_reduce_amount,
              payment_amount
          from tmp_pay
      )t1
      group by activity_rule_id,activity_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71

# 地区主题

  1. 建表语句

    DROP TABLE IF EXISTS dws_area_stats_daycount;
    CREATE EXTERNAL TABLE dws_area_stats_daycount(
        `province_id` STRING COMMENT '地区编号',
        `visit_count` BIGINT COMMENT '访问次数',
        `login_count` BIGINT COMMENT '登录次数',
        `visitor_count` BIGINT COMMENT '访客人数',
        `user_count` BIGINT COMMENT '用户人数',
        `order_count` BIGINT COMMENT '下单次数',
        `order_original_amount` DECIMAL(16,2) COMMENT '下单原始金额',
        `order_final_amount` DECIMAL(16,2) COMMENT '下单最终金额',
        `payment_count` BIGINT COMMENT '支付次数',
        `payment_amount` DECIMAL(16,2) COMMENT '支付金额',
        `refund_order_count` BIGINT COMMENT '退单次数',
        `refund_order_amount` DECIMAL(16,2) COMMENT '退单金额',
        `refund_payment_count` BIGINT COMMENT '退款次数',
        `refund_payment_amount` DECIMAL(16,2) COMMENT '退款金额'
    ) COMMENT '每日地区统计表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dws/dws_area_stats_daycount/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
  2. 数据装载

    1. 首日装载

      with
      tmp_vu as
      (
          select
              dt,
              id province_id,
              visit_count,
              login_count,
              visitor_count,
              user_count
          from
          (
              select
                  dt,
                  area_code,
                  count(*) visit_count,--访客访问次数
                  count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0))
                  count(distinct(mid_id)) visitor_count,--访客人数
                  count(distinct(user_id)) user_count--用户人数
              from dwd_page_log
              where last_page_id is null
              group by dt,area_code
          )tmp
          left join dim_base_province area
          on tmp.area_code=area.area_code
      ),
      tmp_order as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              province_id,
              count(*) order_count,
              sum(original_amount) order_original_amount,
              sum(final_amount) order_final_amount
          from dwd_order_info
          group by date_format(create_time,'yyyy-MM-dd'),province_id
      ),
      tmp_pay as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              province_id,
              count(*) payment_count,
              sum(payment_amount) payment_amount
          from dwd_payment_info
          group by date_format(callback_time,'yyyy-MM-dd'),province_id
      ),
      tmp_ro as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              province_id,
              count(*) refund_order_count,
              sum(refund_amount) refund_order_amount
          from dwd_order_refund_info
          group by date_format(create_time,'yyyy-MM-dd'),province_id
      ),
      tmp_rp as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              province_id,
              count(*) refund_payment_count,
              sum(refund_amount) refund_payment_amount
          from dwd_refund_payment
          group by date_format(callback_time,'yyyy-MM-dd'),province_id
      )
      insert overwrite table dws_area_stats_daycount partition(dt)
      select
          province_id,
          sum(visit_count),
          sum(login_count),
          sum(visitor_count),
          sum(user_count),
          sum(order_count),
          sum(order_original_amount),
          sum(order_final_amount),
          sum(payment_count),
          sum(payment_amount),
          sum(refund_order_count),
          sum(refund_order_amount),
          sum(refund_payment_count),
          sum(refund_payment_amount),
          dt
      from
      (
          select
              dt,
              province_id,
              visit_count,
              login_count,
              visitor_count,
              user_count,
              0 order_count,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_amount
          from tmp_vu
          union all
          select
              dt,
              province_id,
              0 visit_count,
              0 login_count,
              0 visitor_count,
              0 user_count,
              order_count,
              order_original_amount,
              order_final_amount,
              0 payment_count,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_amount
          from tmp_order
          union all
          select
              dt,
              province_id,
              0 visit_count,
              0 login_count,
              0 visitor_count,
              0 user_count,
              0 order_count,
              0 order_original_amount,
              0 order_final_amount,
              payment_count,
              payment_amount,
              0 refund_order_count,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_amount
          from tmp_pay
          union all
          select
              dt,
              province_id,
              0 visit_count,
              0 login_count,
              0 visitor_count,
              0 user_count,
              0 order_count,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_amount,
              refund_order_count,
              refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_amount
          from tmp_ro
          union all
          select
              dt,
              province_id,
              0 visit_count,
              0 login_count,
              0 visitor_count,
              0 user_count,
              0 order_count,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_amount,
              refund_payment_count,
              refund_payment_amount
          from tmp_rp
      )t1
      group by dt,province_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133
      134
      135
      136
      137
      138
      139
      140
      141
      142
      143
      144
      145
      146
      147
      148
      149
      150
      151
      152
      153
      154
      155
      156
      157
      158
      159
      160
      161
      162
      163
      164
      165
      166
      167
      168
      169
      170
      171
      172
      173
      174
      175
      176
      177
    2. 每日装载

      with
      tmp_vu as
      (
          select
              id province_id,
              visit_count,
              login_count,
              visitor_count,
              user_count
          from
          (
              select
                  area_code,
                  count(*) visit_count,--访客访问次数
                  count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0))
                  count(distinct(mid_id)) visitor_count,--访客人数
                  count(distinct(user_id)) user_count--用户人数
              from dwd_page_log
              where dt='2020-06-15'
              and last_page_id is null
              group by area_code
          )tmp
          left join dim_base_province area
          on tmp.area_code=area.area_code
      ),
      tmp_order as
      (
          select
              province_id,
              count(*) order_count,
              sum(original_amount) order_original_amount,
              sum(final_amount) order_final_amount
          from dwd_order_info
          where dt='2020-06-15'
          or dt='9999-99-99'
          and date_format(create_time,'yyyy-MM-dd')='2020-06-15'
          group by province_id
      ),
      tmp_pay as
      (
          select
              province_id,
              count(*) payment_count,
              sum(payment_amount) payment_amount
          from dwd_payment_info
          where dt='2020-06-15'
          group by province_id
      ),
      tmp_ro as
      (
          select
              province_id,
              count(*) refund_order_count,
              sum(refund_amount) refund_order_amount
          from dwd_order_refund_info
          where dt='2020-06-15'
          group by province_id
      ),
      tmp_rp as
      (
          select
              province_id,
              count(*) refund_payment_count,
              sum(refund_amount) refund_payment_amount
          from dwd_refund_payment
          where dt='2020-06-15'
          group by province_id
      )
      insert overwrite table dws_area_stats_daycount partition(dt='2020-06-15')
      select
          province_id,
          sum(visit_count),
          sum(login_count),
          sum(visitor_count),
          sum(user_count),
          sum(order_count),
          sum(order_original_amount),
          sum(order_final_amount),
          sum(payment_count),
          sum(payment_amount),
          sum(refund_order_count),
          sum(refund_order_amount),
          sum(refund_payment_count),
          sum(refund_payment_amount)
      from
      (
          select
              province_id,
              visit_count,
              login_count,
              visitor_count,
              user_count,
              0 order_count,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_amount
          from tmp_vu
          union all
          select
              province_id,
              0 visit_count,
              0 login_count,
              0 visitor_count,
              0 user_count,
              order_count,
              order_original_amount,
              order_final_amount,
              0 payment_count,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_amount
          from tmp_order
          union all
          select
              province_id,
              0 visit_count,
              0 login_count,
              0 visitor_count,
              0 user_count,
              0 order_count,
              0 order_original_amount,
              0 order_final_amount,
              payment_count,
              payment_amount,
              0 refund_order_count,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_amount
          from tmp_pay
          union all
          select
              province_id,
              0 visit_count,
              0 login_count,
              0 visitor_count,
              0 user_count,
              0 order_count,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_amount,
              refund_order_count,
              refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_amount
          from tmp_ro
          union all
          select
              province_id,
              0 visit_count,
              0 login_count,
              0 visitor_count,
              0 user_count,
              0 order_count,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_amount,
              refund_payment_count,
              refund_payment_amount
          from tmp_rp
      )t1
      group by province_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133
      134
      135
      136
      137
      138
      139
      140
      141
      142
      143
      144
      145
      146
      147
      148
      149
      150
      151
      152
      153
      154
      155
      156
      157
      158
      159
      160
      161
      162
      163
      164
      165
      166
      167
      168
      169
      170
      171
      172

# DWS层首日数据装载脚本

  1. 编写脚本

    1. 在/home/damoncai/bin目录下创建脚本dwd_to_dws_init.sh

      #!/bin/bash
      
      APP=gmall
      
      if [ -n "$2" ] ;then
         do_date=$2
      else 
         echo "请传入日期参数"
         exit
      fi
      
      dws_visitor_action_daycount="
      insert overwrite table ${APP}.dws_visitor_action_daycount partition(dt='$do_date')
      select
          t1.mid_id,
          t1.brand,
          t1.model,
          t1.is_new,
          t1.channel,
          t1.os,
          t1.area_code,
          t1.version_code,
          t1.visit_count,
          t3.page_stats
      from
      (
          select
              mid_id,
              brand,
              model,
              if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天内,同一设备的is_new字段,可能全部为1,可能全部为0,也可能部分为0,部分为1(卸载重装),故做该处理
              collect_set(channel) channel,
              collect_set(os) os,
              collect_set(area_code) area_code,
              collect_set(version_code) version_code,
              sum(if(last_page_id is null,1,0)) visit_count
          from ${APP}.dwd_page_log
          where dt='$do_date'
          and last_page_id is null
          group by mid_id,model,brand
      )t1
      join
      (
          select
              mid_id,
              brand,
              model,
              collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_stats
          from
          (
              select
                  mid_id,
                  brand,
                  model,
                  page_id,
                  count(*) page_count,
                  sum(during_time) during_time
              from ${APP}.dwd_page_log
              where dt='$do_date'
              group by mid_id,model,brand,page_id
          )t2
          group by mid_id,model,brand
      )t3
      on t1.mid_id=t3.mid_id
      and t1.brand=t3.brand
      and t1.model=t3.model;
      "
      
      dws_area_stats_daycount="
      set hive.exec.dynamic.partition.mode=nonstrict;
      with
      tmp_vu as
      (
          select
              dt,
              id province_id,
              visit_count,
              login_count,
              visitor_count,
              user_count
          from
          (
              select
                  dt,
                  area_code,
                  count(*) visit_count,--访客访问次数
                  count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0))
                  count(distinct(mid_id)) visitor_count,--访客人数
                  count(distinct(user_id)) user_count--用户人数
              from ${APP}.dwd_page_log
              where last_page_id is null
              group by dt,area_code
          )tmp
          left join ${APP}.dim_base_province area
          on tmp.area_code=area.area_code
      ),
      tmp_order as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              province_id,
              count(*) order_count,
              sum(original_amount) order_original_amount,
              sum(final_amount) order_final_amount
          from ${APP}.dwd_order_info
          group by date_format(create_time,'yyyy-MM-dd'),province_id
      ),
      tmp_pay as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              province_id,
              count(*) payment_count,
              sum(payment_amount) payment_amount
          from ${APP}.dwd_payment_info
          group by date_format(callback_time,'yyyy-MM-dd'),province_id
      ),
      tmp_ro as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              province_id,
              count(*) refund_order_count,
              sum(refund_amount) refund_order_amount
          from ${APP}.dwd_order_refund_info
          group by date_format(create_time,'yyyy-MM-dd'),province_id
      ),
      tmp_rp as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              province_id,
              count(*) refund_payment_count,
              sum(refund_amount) refund_payment_amount
          from ${APP}.dwd_refund_payment
          group by date_format(callback_time,'yyyy-MM-dd'),province_id
      )
      insert overwrite table ${APP}.dws_area_stats_daycount partition(dt)
      select
          province_id,
          sum(visit_count),
          sum(login_count),
          sum(visitor_count),
          sum(user_count),
          sum(order_count),
          sum(order_original_amount),
          sum(order_final_amount),
          sum(payment_count),
          sum(payment_amount),
          sum(refund_order_count),
          sum(refund_order_amount),
          sum(refund_payment_count),
          sum(refund_payment_amount),
          dt
      from
      (
          select
              dt,
              province_id,
              visit_count,
              login_count,
              visitor_count,
              user_count,
              0 order_count,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_amount
          from tmp_vu
          union all
          select
              dt,
              province_id,
              0 visit_count,
              0 login_count,
              0 visitor_count,
              0 user_count,
              order_count,
              order_original_amount,
              order_final_amount,
              0 payment_count,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_amount
          from tmp_order
          union all
          select
              dt,
              province_id,
              0 visit_count,
              0 login_count,
              0 visitor_count,
              0 user_count,
              0 order_count,
              0 order_original_amount,
              0 order_final_amount,
              payment_count,
              payment_amount,
              0 refund_order_count,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_amount
          from tmp_pay
          union all
          select
              dt,
              province_id,
              0 visit_count,
              0 login_count,
              0 visitor_count,
              0 user_count,
              0 order_count,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_amount,
              refund_order_count,
              refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_amount
          from tmp_ro
          union all
          select
              dt,
              province_id,
              0 visit_count,
              0 login_count,
              0 visitor_count,
              0 user_count,
              0 order_count,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_amount,
              refund_payment_count,
              refund_payment_amount
          from tmp_rp
      )t1
      group by dt,province_id;
      "
      
      dws_user_action_daycount="
      set hive.exec.dynamic.partition.mode=nonstrict;
      with
      tmp_login as
      (
          select
              dt,
              user_id,
              count(*) login_count
          from ${APP}.dwd_page_log
          where user_id is not null
          and last_page_id is null
          group by dt,user_id
      ),
      tmp_cf as
      (
          select
              dt,
              user_id,
              sum(if(action_id='cart_add',1,0)) cart_count,
              sum(if(action_id='favor_add',1,0)) favor_count
          from ${APP}.dwd_action_log
          where user_id is not null
          and action_id in ('cart_add','favor_add')
          group by dt,user_id
      ),
      tmp_order as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              user_id,
              count(*) order_count,
              sum(if(activity_reduce_amount>0,1,0)) order_activity_count,
              sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count,
              sum(activity_reduce_amount) order_activity_reduce_amount,
              sum(coupon_reduce_amount) order_coupon_reduce_amount,
              sum(original_amount) order_original_amount,
              sum(final_amount) order_final_amount
          from ${APP}.dwd_order_info
          group by date_format(create_time,'yyyy-MM-dd'),user_id
      ),
      tmp_pay as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              user_id,
              count(*) payment_count,
              sum(payment_amount) payment_amount
          from ${APP}.dwd_payment_info
          group by date_format(callback_time,'yyyy-MM-dd'),user_id
      ),
      tmp_ri as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              user_id,
              count(*) refund_order_count,
              sum(refund_num) refund_order_num,
              sum(refund_amount) refund_order_amount
          from ${APP}.dwd_order_refund_info
          group by date_format(create_time,'yyyy-MM-dd'),user_id
      ),
      tmp_rp as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              rp.user_id,
              count(*) refund_payment_count,
              sum(ri.refund_num) refund_payment_num,
              sum(rp.refund_amount) refund_payment_amount
          from
          (
              select
                  user_id,
                  order_id,
                  sku_id,
                  refund_amount,
                  callback_time
              from ${APP}.dwd_refund_payment
          )rp
          left join
          (
              select
                  user_id,
                  order_id,
                  sku_id,
                  refund_num
              from ${APP}.dwd_order_refund_info
          )ri
          on rp.order_id=ri.order_id
          and rp.sku_id=rp.sku_id
          group by date_format(callback_time,'yyyy-MM-dd'),rp.user_id
      ),
      tmp_coupon as
      (
          select
              coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt) dt,
              coalesce(coupon_get.user_id,coupon_using.user_id,coupon_used.user_id) user_id,
              nvl(coupon_get_count,0) coupon_get_count,
              nvl(coupon_using_count,0) coupon_using_count,
              nvl(coupon_used_count,0) coupon_used_count
          from
          (
              select
                  date_format(get_time,'yyyy-MM-dd') dt,
                  user_id,
                  count(*) coupon_get_count
              from ${APP}.dwd_coupon_use
              where get_time is not null
              group by user_id,date_format(get_time,'yyyy-MM-dd')
          )coupon_get
          full outer join
          (
              select
                  date_format(using_time,'yyyy-MM-dd') dt,
                  user_id,
                  count(*) coupon_using_count
              from ${APP}.dwd_coupon_use
              where using_time is not null
              group by user_id,date_format(using_time,'yyyy-MM-dd')
          )coupon_using
          on coupon_get.dt=coupon_using.dt
          and coupon_get.user_id=coupon_using.user_id
          full outer join
          (
              select
                  date_format(used_time,'yyyy-MM-dd') dt,
                  user_id,
                  count(*) coupon_used_count
              from ${APP}.dwd_coupon_use
              where used_time is not null
              group by user_id,date_format(used_time,'yyyy-MM-dd')
          )coupon_used
          on nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dt
          and nvl(coupon_get.user_id,coupon_using.user_id)=coupon_used.user_id
      ),
      tmp_comment as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              user_id,
              sum(if(appraise='1201',1,0)) appraise_good_count,
              sum(if(appraise='1202',1,0)) appraise_mid_count,
              sum(if(appraise='1203',1,0)) appraise_bad_count,
              sum(if(appraise='1204',1,0)) appraise_default_count
          from ${APP}.dwd_comment_info
          group by date_format(create_time,'yyyy-MM-dd'),user_id
      ),
      tmp_od as
      (
          select
              dt,
              user_id,
              collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_stats
          from
          (
              select
                  date_format(create_time,'yyyy-MM-dd') dt,
                  user_id,
                  sku_id,
                  sum(sku_num) sku_num,
                  count(*) order_count,
                  cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount,
                  cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount,
                  cast(sum(original_amount) as decimal(16,2)) original_amount,
                  cast(sum(split_final_amount) as decimal(16,2)) final_amount
              from ${APP}.dwd_order_detail
              group by date_format(create_time,'yyyy-MM-dd'),user_id,sku_id
          )t1
          group by dt,user_id
      )
      insert overwrite table ${APP}.dws_user_action_daycount partition(dt)
      select
          coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id),
          nvl(login_count,0),
          nvl(cart_count,0),
          nvl(favor_count,0),
          nvl(order_count,0),
          nvl(order_activity_count,0),
          nvl(order_activity_reduce_amount,0),
          nvl(order_coupon_count,0),
          nvl(order_coupon_reduce_amount,0),
          nvl(order_original_amount,0),
          nvl(order_final_amount,0),
          nvl(payment_count,0),
          nvl(payment_amount,0),
          nvl(refund_order_count,0),
          nvl(refund_order_num,0),
          nvl(refund_order_amount,0),
          nvl(refund_payment_count,0),
          nvl(refund_payment_num,0),
          nvl(refund_payment_amount,0),
          nvl(coupon_get_count,0),
          nvl(coupon_using_count,0),
          nvl(coupon_used_count,0),
          nvl(appraise_good_count,0),
          nvl(appraise_mid_count,0),
          nvl(appraise_bad_count,0),
          nvl(appraise_default_count,0),
          order_detail_stats,
          coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt,tmp_od.dt)
      from tmp_login
      full outer join tmp_cf
      on tmp_login.user_id=tmp_cf.user_id
      and tmp_login.dt=tmp_cf.dt
      full outer join tmp_order
      on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id
      and coalesce(tmp_login.dt,tmp_cf.dt)=tmp_order.dt
      full outer join tmp_pay
      on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id
      and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt)=tmp_pay.dt
      full outer join tmp_ri
      on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id
      and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt)=tmp_ri.dt
      full outer join tmp_rp
      on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id
      and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt)=tmp_rp.dt
      full outer join tmp_comment
      on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id
      and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt)=tmp_comment.dt
      full outer join tmp_coupon
      on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id
      and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt)=tmp_coupon.dt
      full outer join tmp_od
      on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id
      and coalesce(tmp_login.dt,tmp_cf.dt,tmp_order.dt,tmp_pay.dt,tmp_ri.dt,tmp_rp.dt,tmp_comment.dt,tmp_coupon.dt)=tmp_od.dt;
      "
      
      dws_activity_info_daycount="
      set hive.exec.dynamic.partition.mode=nonstrict;
      with
      tmp_order as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              activity_rule_id,
              activity_id,
              count(*) order_count,
              sum(split_activity_amount) order_reduce_amount,
              sum(original_amount) order_original_amount,
              sum(split_final_amount) order_final_amount
          from ${APP}.dwd_order_detail
          where activity_id is not null
          group by date_format(create_time,'yyyy-MM-dd'),activity_rule_id,activity_id
      ),
      tmp_pay as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              activity_rule_id,
              activity_id,
              count(*) payment_count,
              sum(split_activity_amount) payment_reduce_amount,
              sum(split_final_amount) payment_amount
          from
          (
              select
                  activity_rule_id,
                  activity_id,
                  order_id,
                  split_activity_amount,
                  split_final_amount
              from ${APP}.dwd_order_detail
              where activity_id is not null
          )od
          join
          (
              select
                  order_id,
                  callback_time
              from ${APP}.dwd_payment_info
          )pi
          on od.order_id=pi.order_id
          group by date_format(callback_time,'yyyy-MM-dd'),activity_rule_id,activity_id
      )
      insert overwrite table ${APP}.dws_activity_info_daycount partition(dt)
      select
          activity_rule_id,
          activity_id,
          sum(order_count),
          sum(order_reduce_amount),
          sum(order_original_amount),
          sum(order_final_amount),
          sum(payment_count),
          sum(payment_reduce_amount),
          sum(payment_amount),
          dt
      from
      (
          select
              dt,
              activity_rule_id,
              activity_id,
              order_count,
              order_reduce_amount,
              order_original_amount,
              order_final_amount,
              0 payment_count,
              0 payment_reduce_amount,
              0 payment_amount
          from tmp_order
          union all
          select
              dt,
              activity_rule_id,
              activity_id,
              0 order_count,
              0 order_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              payment_count,
              payment_reduce_amount,
              payment_amount
          from tmp_pay
      )t1
      group by dt,activity_rule_id,activity_id;"
      
      dws_sku_action_daycount="
      set hive.exec.dynamic.partition.mode=nonstrict;
      with
      tmp_order as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              sku_id,
              count(*) order_count,
              sum(sku_num) order_num,
              sum(if(split_activity_amount>0,1,0)) order_activity_count,
              sum(if(split_coupon_amount>0,1,0)) order_coupon_count,
              sum(split_activity_amount) order_activity_reduce_amount,
              sum(split_coupon_amount) order_coupon_reduce_amount,
              sum(original_amount) order_original_amount,
              sum(split_final_amount) order_final_amount
          from ${APP}.dwd_order_detail
          group by date_format(create_time,'yyyy-MM-dd'),sku_id
      ),
      tmp_pay as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              sku_id,
              count(*) payment_count,
              sum(sku_num) payment_num,
              sum(split_final_amount) payment_amount
          from ${APP}.dwd_order_detail od
          join
          (
              select
                  order_id,
                  callback_time
              from ${APP}.dwd_payment_info
              where callback_time is not null
          )pi on pi.order_id=od.order_id
          group by date_format(callback_time,'yyyy-MM-dd'),sku_id
      ),
      tmp_ri as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              sku_id,
              count(*) refund_order_count,
              sum(refund_num) refund_order_num,
              sum(refund_amount) refund_order_amount
          from ${APP}.dwd_order_refund_info
          group by date_format(create_time,'yyyy-MM-dd'),sku_id
      ),
      tmp_rp as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              rp.sku_id,
              count(*) refund_payment_count,
              sum(ri.refund_num) refund_payment_num,
              sum(refund_amount) refund_payment_amount
          from
          (
              select
                  order_id,
                  sku_id,
                  refund_amount,
                  callback_time
              from ${APP}.dwd_refund_payment
          )rp
          left join
          (
              select
                  order_id,
                  sku_id,
                  refund_num
              from ${APP}.dwd_order_refund_info
          )ri
          on rp.order_id=ri.order_id
          and rp.sku_id=ri.sku_id
          group by date_format(callback_time,'yyyy-MM-dd'),rp.sku_id
      ),
      tmp_cf as
      (
          select
              dt,
              item sku_id,
              sum(if(action_id='cart_add',1,0)) cart_count,
              sum(if(action_id='favor_add',1,0)) favor_count
          from ${APP}.dwd_action_log
          where action_id in ('cart_add','favor_add')
          group by dt,item
      ),
      tmp_comment as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              sku_id,
              sum(if(appraise='1201',1,0)) appraise_good_count,
              sum(if(appraise='1202',1,0)) appraise_mid_count,
              sum(if(appraise='1203',1,0)) appraise_bad_count,
              sum(if(appraise='1204',1,0)) appraise_default_count
          from ${APP}.dwd_comment_info
          group by date_format(create_time,'yyyy-MM-dd'),sku_id
      )
      insert overwrite table ${APP}.dws_sku_action_daycount partition(dt)
      select
          sku_id,
          sum(order_count),
          sum(order_num),
          sum(order_activity_count),
          sum(order_coupon_count),
          sum(order_activity_reduce_amount),
          sum(order_coupon_reduce_amount),
          sum(order_original_amount),
          sum(order_final_amount),
          sum(payment_count),
          sum(payment_num),
          sum(payment_amount),
          sum(refund_order_count),
          sum(refund_order_num),
          sum(refund_order_amount),
          sum(refund_payment_count),
          sum(refund_payment_num),
          sum(refund_payment_amount),
          sum(cart_count),
          sum(favor_count),
          sum(appraise_good_count),
          sum(appraise_mid_count),
          sum(appraise_bad_count),
          sum(appraise_default_count),
          dt
      from
      (
          select
              dt,
              sku_id,
              order_count,
              order_num,
              order_activity_count,
              order_coupon_count,
              order_activity_reduce_amount,
              order_coupon_reduce_amount,
              order_original_amount,
              order_final_amount,
              0 payment_count,
              0 payment_num,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_num,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_num,
              0 refund_payment_amount,
              0 cart_count,
              0 favor_count,
              0 appraise_good_count,
              0 appraise_mid_count,
              0 appraise_bad_count,
              0 appraise_default_count
          from tmp_order
          union all
          select
              dt,
              sku_id,
              0 order_count,
              0 order_num,
              0 order_activity_count,
              0 order_coupon_count,
              0 order_activity_reduce_amount,
              0 order_coupon_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              payment_count,
              payment_num,
              payment_amount,
              0 refund_order_count,
              0 refund_order_num,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_num,
              0 refund_payment_amount,
              0 cart_count,
              0 favor_count,
              0 appraise_good_count,
              0 appraise_mid_count,
              0 appraise_bad_count,
              0 appraise_default_count
          from tmp_pay
          union all
          select
              dt,
              sku_id,
              0 order_count,
              0 order_num,
              0 order_activity_count,
              0 order_coupon_count,
              0 order_activity_reduce_amount,
              0 order_coupon_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_num,
              0 payment_amount,
              refund_order_count,
              refund_order_num,
              refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_num,
              0 refund_payment_amount,
              0 cart_count,
              0 favor_count,
              0 appraise_good_count,
              0 appraise_mid_count,
              0 appraise_bad_count,
              0 appraise_default_count
          from tmp_ri
          union all
          select
              dt,
              sku_id,
              0 order_count,
              0 order_num,
              0 order_activity_count,
              0 order_coupon_count,
              0 order_activity_reduce_amount,
              0 order_coupon_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_num,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_num,
              0 refund_order_amount,
              refund_payment_count,
              refund_payment_num,
              refund_payment_amount,
              0 cart_count,
              0 favor_count,
              0 appraise_good_count,
              0 appraise_mid_count,
              0 appraise_bad_count,
              0 appraise_default_count
          from tmp_rp
          union all
          select
              dt,
              sku_id,
              0 order_count,
              0 order_num,
              0 order_activity_count,
              0 order_coupon_count,
              0 order_activity_reduce_amount,
              0 order_coupon_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_num,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_num,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_num,
              0 refund_payment_amount,
              cart_count,
              favor_count,
              0 appraise_good_count,
              0 appraise_mid_count,
              0 appraise_bad_count,
              0 appraise_default_count
          from tmp_cf
          union all
          select
              dt,
              sku_id,
              0 order_count,
              0 order_num,
              0 order_activity_count,
              0 order_coupon_count,
              0 order_activity_reduce_amount,
              0 order_coupon_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              0 payment_num,
              0 payment_amount,
              0 refund_order_count,
              0 refund_order_num,
              0 refund_order_amount,
              0 refund_payment_count,
              0 refund_payment_num,
              0 refund_payment_amount,
              0 cart_count,
              0 favor_count,
              appraise_good_count,
              appraise_mid_count,
              appraise_bad_count,
              appraise_default_count
          from tmp_comment
      )t1
      group by dt,sku_id;"
      
      dws_coupon_info_daycount="
      set hive.exec.dynamic.partition.mode=nonstrict;
      with
      tmp_cu as
      (
          select
              coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt,coupon_exprie.dt) dt,
              coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id,coupon_exprie.coupon_id) coupon_id,
              nvl(get_count,0) get_count,
              nvl(order_count,0) order_count,
              nvl(payment_count,0) payment_count,
              nvl(expire_count,0) expire_count
          from
          (
              select
                  date_format(get_time,'yyyy-MM-dd') dt,
                  coupon_id,
                  count(*) get_count
              from ${APP}.dwd_coupon_use
              group by date_format(get_time,'yyyy-MM-dd'),coupon_id
          )coupon_get
          full outer join
          (
              select
                  date_format(using_time,'yyyy-MM-dd') dt,
                  coupon_id,
                  count(*) order_count
              from ${APP}.dwd_coupon_use
              where using_time is not null
              group by date_format(using_time,'yyyy-MM-dd'),coupon_id
          )coupon_using
          on coupon_get.dt=coupon_using.dt
          and coupon_get.coupon_id=coupon_using.coupon_id
          full outer join
          (
              select
                  date_format(used_time,'yyyy-MM-dd') dt,
                  coupon_id,
                  count(*) payment_count
              from ${APP}.dwd_coupon_use
              where used_time is not null
              group by date_format(used_time,'yyyy-MM-dd'),coupon_id
          )coupon_used
          on nvl(coupon_get.dt,coupon_using.dt)=coupon_used.dt
          and nvl(coupon_get.coupon_id,coupon_using.coupon_id)=coupon_used.coupon_id
          full outer join
          (
              select
                  date_format(expire_time,'yyyy-MM-dd') dt,
                  coupon_id,
                  count(*) expire_count
              from ${APP}.dwd_coupon_use
              where expire_time is not null
              group by date_format(expire_time,'yyyy-MM-dd'),coupon_id
          )coupon_exprie
          on coalesce(coupon_get.dt,coupon_using.dt,coupon_used.dt)=coupon_exprie.dt
          and coalesce(coupon_get.coupon_id,coupon_using.coupon_id,coupon_used.coupon_id)=coupon_exprie.coupon_id
      ),
      tmp_order as
      (
          select
              date_format(create_time,'yyyy-MM-dd') dt,
              coupon_id,
              sum(split_coupon_amount) order_reduce_amount,
              sum(original_amount) order_original_amount,
              sum(split_final_amount) order_final_amount
          from ${APP}.dwd_order_detail
          where coupon_id is not null
          group by date_format(create_time,'yyyy-MM-dd'),coupon_id
      ),
      tmp_pay as
      (
          select
              date_format(callback_time,'yyyy-MM-dd') dt,
              coupon_id,
              sum(split_coupon_amount) payment_reduce_amount,
              sum(split_final_amount) payment_amount
          from
          (
              select
                  order_id,
                  coupon_id,
                  split_coupon_amount,
                  split_final_amount
              from ${APP}.dwd_order_detail
              where coupon_id is not null
          )od
          join
          (
              select
                  order_id,
                  callback_time
              from ${APP}.dwd_payment_info
          )pi
          on od.order_id=pi.order_id
          group by date_format(callback_time,'yyyy-MM-dd'),coupon_id
      )
      insert overwrite table ${APP}.dws_coupon_info_daycount partition(dt)
      select
          coupon_id,
          sum(get_count),
          sum(order_count),
          sum(order_reduce_amount),
          sum(order_original_amount),
          sum(order_final_amount),
          sum(payment_count),
          sum(payment_reduce_amount),
          sum(payment_amount),
          sum(expire_count),
          dt
      from
      (
          select
              dt,
              coupon_id,
              get_count,
              order_count,
              0 order_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              payment_count,
              0 payment_reduce_amount,
              0 payment_amount,
              expire_count
          from tmp_cu
          union all
          select
              dt,
              coupon_id,
              0 get_count,
              0 order_count,
              order_reduce_amount,
              order_original_amount,
              order_final_amount,
              0 payment_count,
              0 payment_reduce_amount,
              0 payment_amount,
              0 expire_count
          from tmp_order
          union all
          select
              dt,
              coupon_id,
              0 get_count,
              0 order_count,
              0 order_reduce_amount,
              0 order_original_amount,
              0 order_final_amount,
              0 payment_count,
              payment_reduce_amount,
              payment_amount,
              0 expire_count
          from tmp_pay
      )t1
      group by dt,coupon_id;
      "
      
      case $1 in
          "dws_visitor_action_daycount" )
              hive -e "$dws_visitor_action_daycount"
          ;;
          "dws_user_action_daycount" )
              hive -e "$dws_user_action_daycount"
          ;;
          "dws_activity_info_daycount" )
              hive -e "$dws_activity_info_daycount"
          ;;
          "dws_area_stats_daycount" )
              hive -e "$dws_area_stats_daycount"
          ;;
          "dws_sku_action_daycount" )
              hive -e "$dws_sku_action_daycount"
          ;;
          "dws_coupon_info_daycount" )
              hive -e "$dws_coupon_info_daycount"
          ;;
          "all" )
              hive -e "$dws_visitor_action_daycount$dws_user_action_daycount$dws_activity_info_daycount$dws_area_stats_daycount$dws_sku_action_daycount$dws_coupon_info_daycount"
          ;;
      esac
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133
      134
      135
      136
      137
      138
      139
      140
      141
      142
      143
      144
      145
      146
      147
      148
      149
      150
      151
      152
      153
      154
      155
      156
      157
      158
      159
      160
      161
      162
      163
      164
      165
      166
      167
      168
      169
      170
      171
      172
      173
      174
      175
      176
      177
      178
      179
      180
      181
      182
      183
      184
      185
      186
      187
      188
      189
      190
      191
      192
      193
      194
      195
      196
      197
      198
      199
      200
      201
      202
      203
      204
      205
      206
      207
      208
      209
      210
      211
      212
      213
      214
      215
      216
      217
      218
      219
      220
      221
      222
      223
      224
      225
      226
      227
      228
      229
      230
      231
      232
      233
      234
      235
      236
      237
      238
      239
      240
      241
      242
      243
      244
      245
      246
      247
      248
      249
      250
      251
      252
      253
      254
      255
      256
      257
      258
      259
      260
      261
      262
      263
      264
      265
      266
      267
      268
      269
      270
      271
      272
      273
      274
      275
      276
      277
      278
      279
      280
      281
      282
      283
      284
      285
      286
      287
      288
      289
      290
      291
      292
      293
      294
      295
      296
      297
      298
      299
      300
      301
      302
      303
      304
      305
      306
      307
      308
      309
      310
      311
      312
      313
      314
      315
      316
      317
      318
      319
      320
      321
      322
      323
      324
      325
      326
      327
      328
      329
      330
      331
      332
      333
      334
      335
      336
      337
      338
      339
      340
      341
      342
      343
      344
      345
      346
      347
      348
      349
      350
      351
      352
      353
      354
      355
      356
      357
      358
      359
      360
      361
      362
      363
      364
      365
      366
      367
      368
      369
      370
      371
      372
      373
      374
      375
      376
      377
      378
      379
      380
      381
      382
      383
      384
      385
      386
      387
      388
      389
      390
      391
      392
      393
      394
      395
      396
      397
      398
      399
      400
      401
      402
      403
      404
      405
      406
      407
      408
      409
      410
      411
      412
      413
      414
      415
      416
      417
      418
      419
      420
      421
      422
      423
      424
      425
      426
      427
      428
      429
      430
      431
      432
      433
      434
      435
      436
      437
      438
      439
      440
      441
      442
      443
      444
      445
      446
      447
      448
      449
      450
      451
      452
      453
      454
      455
      456
      457
      458
      459
      460
      461
      462
      463
      464
      465
      466
      467
      468
      469
      470
      471
      472
      473
      474
      475
      476
      477
      478
      479
      480
      481
      482
      483
      484
      485
      486
      487
      488
      489
      490
      491
      492
      493
      494
      495
      496
      497
      498
      499
      500
      501
      502
      503
      504
      505
      506
      507
      508
      509
      510
      511
      512
      513
      514
      515
      516
      517
      518
      519
      520
      521
      522
      523
      524
      525
      526
      527
      528
      529
      530
      531
      532
      533
      534
      535
      536
      537
      538
      539
      540
      541
      542
      543
      544
      545
      546
      547
      548
      549
      550
      551
      552
      553
      554
      555
      556
      557
      558
      559
      560
      561
      562
      563
      564
      565
      566
      567
      568
      569
      570
      571
      572
      573
      574
      575
      576
      577
      578
      579
      580
      581
      582
      583
      584
      585
      586
      587
      588
      589
      590
      591
      592
      593
      594
      595
      596
      597
      598
      599
      600
      601
      602
      603
      604
      605
      606
      607
      608
      609
      610
      611
      612
      613
      614
      615
      616
      617
      618
      619
      620
      621
      622
      623
      624
      625
      626
      627
      628
      629
      630
      631
      632
      633
      634
      635
      636
      637
      638
      639
      640
      641
      642
      643
      644
      645
      646
      647
      648
      649
      650
      651
      652
      653
      654
      655
      656
      657
      658
      659
      660
      661
      662
      663
      664
      665
      666
      667
      668
      669
      670
      671
      672
      673
      674
      675
      676
      677
      678
      679
      680
      681
      682
      683
      684
      685
      686
      687
      688
      689
      690
      691
      692
      693
      694
      695
      696
      697
      698
      699
      700
      701
      702
      703
      704
      705
      706
      707
      708
      709
      710
      711
      712
      713
      714
      715
      716
      717
      718
      719
      720
      721
      722
      723
      724
      725
      726
      727
      728
      729
      730
      731
      732
      733
      734
      735
      736
      737
      738
      739
      740
      741
      742
      743
      744
      745
      746
      747
      748
      749
      750
      751
      752
      753
      754
      755
      756
      757
      758
      759
      760
      761
      762
      763
      764
      765
      766
      767
      768
      769
      770
      771
      772
      773
      774
      775
      776
      777
      778
      779
      780
      781
      782
      783
      784
      785
      786
      787
      788
      789
      790
      791
      792
      793
      794
      795
      796
      797
      798
      799
      800
      801
      802
      803
      804
      805
      806
      807
      808
      809
      810
      811
      812
      813
      814
      815
      816
      817
      818
      819
      820
      821
      822
      823
      824
      825
      826
      827
      828
      829
      830
      831
      832
      833
      834
      835
      836
      837
      838
      839
      840
      841
      842
      843
      844
      845
      846
      847
      848
      849
      850
      851
      852
      853
      854
      855
      856
      857
      858
      859
      860
      861
      862
      863
      864
      865
      866
      867
      868
      869
      870
      871
      872
      873
      874
      875
      876
      877
      878
      879
      880
      881
      882
      883
      884
      885
      886
      887
      888
      889
      890
      891
      892
      893
      894
      895
      896
      897
      898
      899
      900
      901
      902
      903
      904
      905
      906
      907
      908
      909
      910
      911
      912
      913
      914
      915
      916
      917
      918
      919
      920
      921
      922
      923
      924
      925
      926
      927
      928
      929
      930
      931
      932
      933
      934
      935
      936
      937
      938
      939
      940
      941
      942
      943
      944
      945
      946
      947
      948
      949
      950
      951
      952
      953
      954
      955
      956
      957
      958
      959
      960
      961
      962
      963
      964
      965
      966
      967
      968
      969
      970
      971
      972
      973
      974
      975
      976
      977
      978
      979
      980
      981
      982
      983
      984
      985
      986
      987
      988
      989
      990
      991
      992
      993
      994
      995
      996
      997
      998
      999
      1000
      1001
      1002
      1003
      1004
      1005
      1006
      1007
      1008
      1009
      1010
      1011
      1012
      1013
      1014
      1015
      1016
      1017
      1018
      1019
      1020
      1021
      1022
      1023
      1024
      1025
      1026
      1027
      1028
      1029
      1030
      1031
      1032
      1033
      1034
      1035
      1036
      1037
      1038
      1039
      1040
      1041
      1042
      1043
      1044
      1045
      1046
    2. 添加执行权限

    3. 执行脚本

      dwd_to_dws_init.sh all 2020-06-14
      
      1

# DWS层每日数据装载脚本

  1. 在/home/damoncai/bin目录下创建脚本dwd_to_dws.sh

    #!/bin/bash
    
    APP=gmall
    # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
    if [ -n "$2" ] ;then
        do_date=$2
    else 
        do_date=`date -d "-1 day" +%F`
    fi
    
    dws_visitor_action_daycount="insert overwrite table ${APP}.dws_visitor_action_daycount partition(dt='$do_date')
    select
        t1.mid_id,
        t1.brand,
        t1.model,
        t1.is_new,
        t1.channel,
        t1.os,
        t1.area_code,
        t1.version_code,
        t1.visit_count,
        t3.page_stats
    from
    (
        select
            mid_id,
            brand,
            model,
            if(array_contains(collect_set(is_new),'0'),'0','1') is_new,--ods_page_log中,同一天内,同一设备的is_new字段,可能全部为1,可能全部为0,也可能部分为0,部分为1(卸载重装),故做该处理
            collect_set(channel) channel,
            collect_set(os) os,
            collect_set(area_code) area_code,
            collect_set(version_code) version_code,
            sum(if(last_page_id is null,1,0)) visit_count
        from ${APP}.dwd_page_log
        where dt='$do_date'
        and last_page_id is null
        group by mid_id,model,brand
    )t1
    join
    (
        select
            mid_id,
            brand,
            model,
            collect_set(named_struct('page_id',page_id,'page_count',page_count,'during_time',during_time)) page_stats
        from
        (
            select
                mid_id,
                brand,
                model,
                page_id,
                count(*) page_count,
                sum(during_time) during_time
            from ${APP}.dwd_page_log
            where dt='$do_date'
            group by mid_id,model,brand,page_id
        )t2
        group by mid_id,model,brand
    )t3
    on t1.mid_id=t3.mid_id
    and t1.brand=t3.brand
    and t1.model=t3.model;"
    
    dws_user_action_daycount="
    with
    tmp_login as
    (
        select
            user_id,
            count(*) login_count
        from ${APP}.dwd_page_log
        where dt='$do_date'
        and user_id is not null
        and last_page_id is null
        group by user_id
    ),
    tmp_cf as
    (
        select
            user_id,
            sum(if(action_id='cart_add',1,0)) cart_count,
            sum(if(action_id='favor_add',1,0)) favor_count
        from ${APP}.dwd_action_log
        where dt='$do_date'
        and user_id is not null
        and action_id in ('cart_add','favor_add')
        group by user_id
    ),
    tmp_order as
    (
        select
            user_id,
            count(*) order_count,
            sum(if(activity_reduce_amount>0,1,0)) order_activity_count,
            sum(if(coupon_reduce_amount>0,1,0)) order_coupon_count,
            sum(activity_reduce_amount) order_activity_reduce_amount,
            sum(coupon_reduce_amount) order_coupon_reduce_amount,
            sum(original_amount) order_original_amount,
            sum(final_amount) order_final_amount
        from ${APP}.dwd_order_info
        where (dt='$do_date'
        or dt='9999-99-99')
        and date_format(create_time,'yyyy-MM-dd')='$do_date'
        group by user_id
    ),
    tmp_pay as
    (
        select
            user_id,
            count(*) payment_count,
            sum(payment_amount) payment_amount
        from ${APP}.dwd_payment_info
        where dt='$do_date'
        group by user_id
    ),
    tmp_ri as
    (
        select
            user_id,
            count(*) refund_order_count,
            sum(refund_num) refund_order_num,
            sum(refund_amount) refund_order_amount
        from ${APP}.dwd_order_refund_info
        where dt='$do_date'
        group by user_id
    ),
    tmp_rp as
    (
        select
            rp.user_id,
            count(*) refund_payment_count,
            sum(ri.refund_num) refund_payment_num,
            sum(rp.refund_amount) refund_payment_amount
        from
        (
            select
                user_id,
                order_id,
                sku_id,
                refund_amount
            from ${APP}.dwd_refund_payment
            where dt='$do_date'
        )rp
        left join
        (
            select
                user_id,
                order_id,
                sku_id,
                refund_num
            from ${APP}.dwd_order_refund_info
            where dt>=date_add('$do_date',-15)
        )ri
        on rp.order_id=ri.order_id
        and rp.sku_id=rp.sku_id
        group by rp.user_id
    ),
    tmp_coupon as
    (
        select
            user_id,
            sum(if(date_format(get_time,'yyyy-MM-dd')='$do_date',1,0)) coupon_get_count,
            sum(if(date_format(using_time,'yyyy-MM-dd')='$do_date',1,0)) coupon_using_count,
            sum(if(date_format(used_time,'yyyy-MM-dd')='$do_date',1,0)) coupon_used_count
        from ${APP}.dwd_coupon_use
        where (dt='$do_date' or dt='9999-99-99')
        and (date_format(get_time, 'yyyy-MM-dd') = '$do_date'
        or date_format(using_time,'yyyy-MM-dd')='$do_date'
        or date_format(used_time,'yyyy-MM-dd')='$do_date')
        group by user_id
    ),
    tmp_comment as
    (
        select
            user_id,
            sum(if(appraise='1201',1,0)) appraise_good_count,
            sum(if(appraise='1202',1,0)) appraise_mid_count,
            sum(if(appraise='1203',1,0)) appraise_bad_count,
            sum(if(appraise='1204',1,0)) appraise_default_count
        from ${APP}.dwd_comment_info
        where dt='$do_date'
        group by user_id
    ),
    tmp_od as
    (
        select
            user_id,
            collect_set(named_struct('sku_id',sku_id,'sku_num',sku_num,'order_count',order_count,'activity_reduce_amount',activity_reduce_amount,'coupon_reduce_amount',coupon_reduce_amount,'original_amount',original_amount,'final_amount',final_amount)) order_detail_stats
        from
        (
            select
                user_id,
                sku_id,
                sum(sku_num) sku_num,
                count(*) order_count,
                cast(sum(split_activity_amount) as decimal(16,2)) activity_reduce_amount,
                cast(sum(split_coupon_amount) as decimal(16,2)) coupon_reduce_amount,
                cast(sum(original_amount) as decimal(16,2)) original_amount,
                cast(sum(split_final_amount) as decimal(16,2)) final_amount
            from ${APP}.dwd_order_detail
            where dt='$do_date'
            group by user_id,sku_id
        )t1
        group by user_id
    )
    insert overwrite table ${APP}.dws_user_action_daycount partition(dt='$do_date')
    select
        coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id,tmp_od.user_id),
        nvl(login_count,0),
        nvl(cart_count,0),
        nvl(favor_count,0),
        nvl(order_count,0),
        nvl(order_activity_count,0),
        nvl(order_activity_reduce_amount,0),
        nvl(order_coupon_count,0),
        nvl(order_coupon_reduce_amount,0),
        nvl(order_original_amount,0),
        nvl(order_final_amount,0),
        nvl(payment_count,0),
        nvl(payment_amount,0),
        nvl(refund_order_count,0),
        nvl(refund_order_num,0),
        nvl(refund_order_amount,0),
        nvl(refund_payment_count,0),
        nvl(refund_payment_num,0),
        nvl(refund_payment_amount,0),
        nvl(coupon_get_count,0),
        nvl(coupon_using_count,0),
        nvl(coupon_used_count,0),
        nvl(appraise_good_count,0),
        nvl(appraise_mid_count,0),
        nvl(appraise_bad_count,0),
        nvl(appraise_default_count,0),
        order_detail_stats
    from tmp_login
    full outer join tmp_cf on tmp_login.user_id=tmp_cf.user_id
    full outer join tmp_order on coalesce(tmp_login.user_id,tmp_cf.user_id)=tmp_order.user_id
    full outer join tmp_pay on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id)=tmp_pay.user_id
    full outer join tmp_ri on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id)=tmp_ri.user_id
    full outer join tmp_rp on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id)=tmp_rp.user_id
    full outer join tmp_comment on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id)=tmp_comment.user_id
    full outer join tmp_coupon on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id)=tmp_coupon.user_id
    full outer join tmp_od on coalesce(tmp_login.user_id,tmp_cf.user_id,tmp_order.user_id,tmp_pay.user_id,tmp_ri.user_id,tmp_rp.user_id,tmp_comment.user_id,tmp_coupon.user_id)=tmp_od.user_id;
    "
    
    
    dws_activity_info_daycount="
    with
    tmp_order as
    (
        select
            activity_rule_id,
            activity_id,
            count(*) order_count,
            sum(split_activity_amount) order_reduce_amount,
            sum(original_amount) order_original_amount,
            sum(split_final_amount) order_final_amount
        from ${APP}.dwd_order_detail
        where dt='$do_date'
        and activity_id is not null
        group by activity_rule_id,activity_id
    ),
    tmp_pay as
    (
        select
            activity_rule_id,
            activity_id,
            count(*) payment_count,
            sum(split_activity_amount) payment_reduce_amount,
            sum(split_final_amount) payment_amount
        from ${APP}.dwd_order_detail
        where (dt='$do_date'
        or dt=date_add('$do_date',-1))
        and activity_id is not null
        and order_id in
        (
            select order_id from ${APP}.dwd_payment_info where dt='$do_date'
        )
        group by activity_rule_id,activity_id
    )
    insert overwrite table ${APP}.dws_activity_info_daycount partition(dt='$do_date')
    select
        activity_rule_id,
        activity_id,
        sum(order_count),
        sum(order_reduce_amount),
        sum(order_original_amount),
        sum(order_final_amount),
        sum(payment_count),
        sum(payment_reduce_amount),
        sum(payment_amount)
    from
    (
        select
            activity_rule_id,
            activity_id,
            order_count,
            order_reduce_amount,
            order_original_amount,
            order_final_amount,
            0 payment_count,
            0 payment_reduce_amount,
            0 payment_amount
        from tmp_order
        union all
        select
            activity_rule_id,
            activity_id,
            0 order_count,
            0 order_reduce_amount,
            0 order_original_amount,
            0 order_final_amount,
            payment_count,
            payment_reduce_amount,
            payment_amount
        from tmp_pay
    )t1
    group by activity_rule_id,activity_id;"
    
    
    dws_sku_action_daycount="
    with
    tmp_order as
    (
        select
            sku_id,
            count(*) order_count,
            sum(sku_num) order_num,
            sum(if(split_activity_amount>0,1,0)) order_activity_count,
            sum(if(split_coupon_amount>0,1,0)) order_coupon_count,
            sum(split_activity_amount) order_activity_reduce_amount,
            sum(split_coupon_amount) order_coupon_reduce_amount,
            sum(original_amount) order_original_amount,
            sum(split_final_amount) order_final_amount
        from ${APP}.dwd_order_detail
        where dt='$do_date'
        group by sku_id
    ),
    tmp_pay as
    (
        select
            sku_id,
            count(*) payment_count,
            sum(sku_num) payment_num,
            sum(split_final_amount) payment_amount
        from ${APP}.dwd_order_detail
        where (dt='$do_date'
        or dt=date_add('$do_date',-1))
        and order_id in
        (
            select order_id from ${APP}.dwd_payment_info where dt='$do_date'
        )
        group by sku_id
    ),
    tmp_ri as
    (
        select
            sku_id,
            count(*) refund_order_count,
            sum(refund_num) refund_order_num,
            sum(refund_amount) refund_order_amount
        from ${APP}.dwd_order_refund_info
        where dt='$do_date'
        group by sku_id
    ),
    tmp_rp as
    (
        select
            rp.sku_id,
            count(*) refund_payment_count,
            sum(ri.refund_num) refund_payment_num,
            sum(refund_amount) refund_payment_amount
        from
        (
            select
                order_id,
                sku_id,
                refund_amount
            from ${APP}.dwd_refund_payment
            where dt='$do_date'
        )rp
        left join
        (
            select
                order_id,
                sku_id,
                refund_num
            from ${APP}.dwd_order_refund_info
            where dt>=date_add('$do_date',-15)
        )ri
        on rp.order_id=ri.order_id
        and rp.sku_id=ri.sku_id
        group by rp.sku_id
    ),
    tmp_cf as
    (
        select
            item sku_id,
            sum(if(action_id='cart_add',1,0)) cart_count,
            sum(if(action_id='favor_add',1,0)) favor_count
        from ${APP}.dwd_action_log
        where dt='$do_date'
        and action_id in ('cart_add','favor_add')
        group by item
    ),
    tmp_comment as
    (
        select
            sku_id,
            sum(if(appraise='1201',1,0)) appraise_good_count,
            sum(if(appraise='1202',1,0)) appraise_mid_count,
            sum(if(appraise='1203',1,0)) appraise_bad_count,
            sum(if(appraise='1204',1,0)) appraise_default_count
        from ${APP}.dwd_comment_info
        where dt='$do_date'
        group by sku_id
    )
    insert overwrite table ${APP}.dws_sku_action_daycount partition(dt='$do_date')
    select
        sku_id,
        sum(order_count),
        sum(order_num),
        sum(order_activity_count),
        sum(order_coupon_count),
        sum(order_activity_reduce_amount),
        sum(order_coupon_reduce_amount),
        sum(order_original_amount),
        sum(order_final_amount),
        sum(payment_count),
        sum(payment_num),
        sum(payment_amount),
        sum(refund_order_count),
        sum(refund_order_num),
        sum(refund_order_amount),
        sum(refund_payment_count),
        sum(refund_payment_num),
        sum(refund_payment_amount),
        sum(cart_count),
        sum(favor_count),
        sum(appraise_good_count),
        sum(appraise_mid_count),
        sum(appraise_bad_count),
        sum(appraise_default_count)
    from
    (
        select
            sku_id,
            order_count,
            order_num,
            order_activity_count,
            order_coupon_count,
            order_activity_reduce_amount,
            order_coupon_reduce_amount,
            order_original_amount,
            order_final_amount,
            0 payment_count,
            0 payment_num,
            0 payment_amount,
            0 refund_order_count,
            0 refund_order_num,
            0 refund_order_amount,
            0 refund_payment_count,
            0 refund_payment_num,
            0 refund_payment_amount,
            0 cart_count,
            0 favor_count,
            0 appraise_good_count,
            0 appraise_mid_count,
            0 appraise_bad_count,
            0 appraise_default_count
        from tmp_order
        union all
        select
            sku_id,
            0 order_count,
            0 order_num,
            0 order_activity_count,
            0 order_coupon_count,
            0 order_activity_reduce_amount,
            0 order_coupon_reduce_amount,
            0 order_original_amount,
            0 order_final_amount,
            payment_count,
            payment_num,
            payment_amount,
            0 refund_order_count,
            0 refund_order_num,
            0 refund_order_amount,
            0 refund_payment_count,
            0 refund_payment_num,
            0 refund_payment_amount,
            0 cart_count,
            0 favor_count,
            0 appraise_good_count,
            0 appraise_mid_count,
            0 appraise_bad_count,
            0 appraise_default_count
        from tmp_pay
        union all
        select
            sku_id,
            0 order_count,
            0 order_num,
            0 order_activity_count,
            0 order_coupon_count,
            0 order_activity_reduce_amount,
            0 order_coupon_reduce_amount,
            0 order_original_amount,
            0 order_final_amount,
            0 payment_count,
            0 payment_num,
            0 payment_amount,
            refund_order_count,
            refund_order_num,
            refund_order_amount,
            0 refund_payment_count,
            0 refund_payment_num,
            0 refund_payment_amount,
            0 cart_count,
            0 favor_count,
            0 appraise_good_count,
            0 appraise_mid_count,
            0 appraise_bad_count,
            0 appraise_default_count
        from tmp_ri
        union all
        select
            sku_id,
            0 order_count,
            0 order_num,
            0 order_activity_count,
            0 order_coupon_count,
            0 order_activity_reduce_amount,
            0 order_coupon_reduce_amount,
            0 order_original_amount,
            0 order_final_amount,
            0 payment_count,
            0 payment_num,
            0 payment_amount,
            0 refund_order_count,
            0 refund_order_num,
            0 refund_order_amount,
            refund_payment_count,
            refund_payment_num,
            refund_payment_amount,
            0 cart_count,
            0 favor_count,
            0 appraise_good_count,
            0 appraise_mid_count,
            0 appraise_bad_count,
            0 appraise_default_count
        from tmp_rp
        union all
        select
            sku_id,
            0 order_count,
            0 order_num,
            0 order_activity_count,
            0 order_coupon_count,
            0 order_activity_reduce_amount,
            0 order_coupon_reduce_amount,
            0 order_original_amount,
            0 order_final_amount,
            0 payment_count,
            0 payment_num,
            0 payment_amount,
            0 refund_order_count,
            0 refund_order_num,
            0 refund_order_amount,
            0 refund_payment_count,
            0 refund_payment_num,
            0 refund_payment_amount,
            cart_count,
            favor_count,
            0 appraise_good_count,
            0 appraise_mid_count,
            0 appraise_bad_count,
            0 appraise_default_count
        from tmp_cf
        union all
        select
            sku_id,
            0 order_count,
            0 order_num,
            0 order_activity_count,
            0 order_coupon_count,
            0 order_activity_reduce_amount,
            0 order_coupon_reduce_amount,
            0 order_original_amount,
            0 order_final_amount,
            0 payment_count,
            0 payment_num,
            0 payment_amount,
            0 refund_order_count,
            0 refund_order_num,
            0 refund_order_amount,
            0 refund_payment_count,
            0 refund_payment_num,
            0 refund_payment_amount,
            0 cart_count,
            0 favor_count,
            appraise_good_count,
            appraise_mid_count,
            appraise_bad_count,
            appraise_default_count
        from tmp_comment
    )t1
    group by sku_id;"
    
    dws_coupon_info_daycount="
    with
    tmp_cu as
    (
        select
            coupon_id,
            sum(if(date_format(get_time,'yyyy-MM-dd')='$do_date',1,0)) get_count,
            sum(if(date_format(using_time,'yyyy-MM-dd')='$do_date',1,0)) order_count,
            sum(if(date_format(used_time,'yyyy-MM-dd')='$do_date',1,0)) payment_count,
            sum(if(date_format(expire_time,'yyyy-MM-dd')='$do_date',1,0)) expire_count
        from ${APP}.dwd_coupon_use
        where dt='9999-99-99'
        or dt='$do_date'
        group by coupon_id
    ),
    tmp_order as
    (
        select
            coupon_id,
            sum(split_coupon_amount) order_reduce_amount,
            sum(original_amount) order_original_amount,
            sum(split_final_amount) order_final_amount
        from ${APP}.dwd_order_detail
        where dt='$do_date'
        and coupon_id is not null
        group by coupon_id
    ),
    tmp_pay as
    (
        select
            coupon_id,
            sum(split_coupon_amount) payment_reduce_amount,
            sum(split_final_amount) payment_amount
        from ${APP}.dwd_order_detail
        where (dt='$do_date'
        or dt=date_add('$do_date',-1))
        and coupon_id is not null
        and order_id in
        (
            select order_id from ${APP}.dwd_payment_info where dt='$do_date'
        )
        group by coupon_id
    )
    insert overwrite table ${APP}.dws_coupon_info_daycount partition(dt='$do_date')
    select
        coupon_id,
        sum(get_count),
        sum(order_count),
        sum(order_reduce_amount),
        sum(order_original_amount),
        sum(order_final_amount),
        sum(payment_count),
        sum(payment_reduce_amount),
        sum(payment_amount),
        sum(expire_count)
    from
    (
        select
            coupon_id,
            get_count,
            order_count,
            0 order_reduce_amount,
            0 order_original_amount,
            0 order_final_amount,
            payment_count,
            0 payment_reduce_amount,
            0 payment_amount,
            expire_count
        from tmp_cu
        union all
        select
            coupon_id,
            0 get_count,
            0 order_count,
            order_reduce_amount,
            order_original_amount,
            order_final_amount,
            0 payment_count,
            0 payment_reduce_amount,
            0 payment_amount,
            0 expire_count
        from tmp_order
        union all
        select
            coupon_id,
            0 get_count,
            0 order_count,
            0 order_reduce_amount,
            0 order_original_amount,
            0 order_final_amount,
            0 payment_count,
            payment_reduce_amount,
            payment_amount,
            0 expire_count
        from tmp_pay
    )t1
    group by coupon_id;"
    
    
    dws_area_stats_daycount="
    with
    tmp_vu as
    (
        select
            id province_id,
            visit_count,
            login_count,
            visitor_count,
            user_count
        from
        (
            select
                area_code,
                count(*) visit_count,--访客访问次数
                count(user_id) login_count,--用户访问次数,等价于sum(if(user_id is not null,1,0))
                count(distinct(mid_id)) visitor_count,--访客人数
                count(distinct(user_id)) user_count--用户人数
            from ${APP}.dwd_page_log
            where dt='$do_date'
            and last_page_id is null
            group by area_code
        )tmp
        left join ${APP}.dim_base_province area
        on tmp.area_code=area.area_code
    ),
    tmp_order as
    (
        select
            province_id,
            count(*) order_count,
            sum(original_amount) order_original_amount,
            sum(final_amount) order_final_amount
        from ${APP}.dwd_order_info
        where dt='$do_date'
        or dt='9999-99-99'
        and date_format(create_time,'yyyy-MM-dd')='$do_date'
        group by province_id
    ),
    tmp_pay as
    (
        select
            province_id,
            count(*) payment_count,
            sum(payment_amount) payment_amount
        from ${APP}.dwd_payment_info
        where dt='$do_date'
        group by province_id
    ),
    tmp_ro as
    (
        select
            province_id,
            count(*) refund_order_count,
            sum(refund_amount) refund_order_amount
        from ${APP}.dwd_order_refund_info
        where dt='$do_date'
        group by province_id
    ),
    tmp_rp as
    (
        select
            province_id,
            count(*) refund_payment_count,
            sum(refund_amount) refund_payment_amount
        from ${APP}.dwd_refund_payment
        where dt='$do_date'
        group by province_id
    )
    insert overwrite table ${APP}.dws_area_stats_daycount partition(dt='$do_date')
    select
        province_id,
        sum(visit_count),
        sum(login_count),
        sum(visitor_count),
        sum(user_count),
        sum(order_count),
        sum(order_original_amount),
        sum(order_final_amount),
        sum(payment_count),
        sum(payment_amount),
        sum(refund_order_count),
        sum(refund_order_amount),
        sum(refund_payment_count),
        sum(refund_payment_amount)
    from
    (
        select
            province_id,
            visit_count,
            login_count,
            visitor_count,
            user_count,
            0 order_count,
            0 order_original_amount,
            0 order_final_amount,
            0 payment_count,
            0 payment_amount,
            0 refund_order_count,
            0 refund_order_amount,
            0 refund_payment_count,
            0 refund_payment_amount
        from tmp_vu
        union all
        select
            province_id,
            0 visit_count,
            0 login_count,
            0 visitor_count,
            0 user_count,
            order_count,
            order_original_amount,
            order_final_amount,
            0 payment_count,
            0 payment_amount,
            0 refund_order_count,
            0 refund_order_amount,
            0 refund_payment_count,
            0 refund_payment_amount
        from tmp_order
        union all
        select
            province_id,
            0 visit_count,
            0 login_count,
            0 visitor_count,
            0 user_count,
            0 order_count,
            0 order_original_amount,
            0 order_final_amount,
            payment_count,
            payment_amount,
            0 refund_order_count,
            0 refund_order_amount,
            0 refund_payment_count,
            0 refund_payment_amount
        from tmp_pay
        union all
        select
            province_id,
            0 visit_count,
            0 login_count,
            0 visitor_count,
            0 user_count,
            0 order_count,
            0 order_original_amount,
            0 order_final_amount,
            0 payment_count,
            0 payment_amount,
            refund_order_count,
            refund_order_amount,
            0 refund_payment_count,
            0 refund_payment_amount
        from tmp_ro
        union all
        select
            province_id,
            0 visit_count,
            0 login_count,
            0 visitor_count,
            0 user_count,
            0 order_count,
            0 order_original_amount,
            0 order_final_amount,
            0 payment_count,
            0 payment_amount,
            0 refund_order_count,
            0 refund_order_amount,
            refund_payment_count,
            refund_payment_amount
        from tmp_rp
    )t1
    group by province_id;"
    
    case $1 in
        "dws_visitor_action_daycount" )
            hive -e "$dws_visitor_action_daycount"
        ;;
        "dws_user_action_daycount" )
            hive -e "$dws_user_action_daycount"
        ;;
        "dws_activity_info_daycount" )
            hive -e "$dws_activity_info_daycount"
        ;;
        "dws_area_stats_daycount" )
            hive -e "$dws_area_stats_daycount"
        ;;
        "dws_sku_action_daycount" )
            hive -e "$dws_sku_action_daycount"
        ;;
        "dws_coupon_info_daycount" )
            hive -e "$dws_coupon_info_daycount"
        ;;
        "all" )
            hive -e "$dws_visitor_action_daycount$dws_user_action_daycount$dws_activity_info_daycount$dws_area_stats_daycount$dws_sku_action_daycount$dws_coupon_info_daycount"
        ;;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
    318
    319
    320
    321
    322
    323
    324
    325
    326
    327
    328
    329
    330
    331
    332
    333
    334
    335
    336
    337
    338
    339
    340
    341
    342
    343
    344
    345
    346
    347
    348
    349
    350
    351
    352
    353
    354
    355
    356
    357
    358
    359
    360
    361
    362
    363
    364
    365
    366
    367
    368
    369
    370
    371
    372
    373
    374
    375
    376
    377
    378
    379
    380
    381
    382
    383
    384
    385
    386
    387
    388
    389
    390
    391
    392
    393
    394
    395
    396
    397
    398
    399
    400
    401
    402
    403
    404
    405
    406
    407
    408
    409
    410
    411
    412
    413
    414
    415
    416
    417
    418
    419
    420
    421
    422
    423
    424
    425
    426
    427
    428
    429
    430
    431
    432
    433
    434
    435
    436
    437
    438
    439
    440
    441
    442
    443
    444
    445
    446
    447
    448
    449
    450
    451
    452
    453
    454
    455
    456
    457
    458
    459
    460
    461
    462
    463
    464
    465
    466
    467
    468
    469
    470
    471
    472
    473
    474
    475
    476
    477
    478
    479
    480
    481
    482
    483
    484
    485
    486
    487
    488
    489
    490
    491
    492
    493
    494
    495
    496
    497
    498
    499
    500
    501
    502
    503
    504
    505
    506
    507
    508
    509
    510
    511
    512
    513
    514
    515
    516
    517
    518
    519
    520
    521
    522
    523
    524
    525
    526
    527
    528
    529
    530
    531
    532
    533
    534
    535
    536
    537
    538
    539
    540
    541
    542
    543
    544
    545
    546
    547
    548
    549
    550
    551
    552
    553
    554
    555
    556
    557
    558
    559
    560
    561
    562
    563
    564
    565
    566
    567
    568
    569
    570
    571
    572
    573
    574
    575
    576
    577
    578
    579
    580
    581
    582
    583
    584
    585
    586
    587
    588
    589
    590
    591
    592
    593
    594
    595
    596
    597
    598
    599
    600
    601
    602
    603
    604
    605
    606
    607
    608
    609
    610
    611
    612
    613
    614
    615
    616
    617
    618
    619
    620
    621
    622
    623
    624
    625
    626
    627
    628
    629
    630
    631
    632
    633
    634
    635
    636
    637
    638
    639
    640
    641
    642
    643
    644
    645
    646
    647
    648
    649
    650
    651
    652
    653
    654
    655
    656
    657
    658
    659
    660
    661
    662
    663
    664
    665
    666
    667
    668
    669
    670
    671
    672
    673
    674
    675
    676
    677
    678
    679
    680
    681
    682
    683
    684
    685
    686
    687
    688
    689
    690
    691
    692
    693
    694
    695
    696
    697
    698
    699
    700
    701
    702
    703
    704
    705
    706
    707
    708
    709
    710
    711
    712
    713
    714
    715
    716
    717
    718
    719
    720
    721
    722
    723
    724
    725
    726
    727
    728
    729
    730
    731
    732
    733
    734
    735
    736
    737
    738
    739
    740
    741
    742
    743
    744
    745
    746
    747
    748
    749
    750
    751
    752
    753
    754
    755
    756
    757
    758
    759
    760
    761
    762
    763
    764
    765
    766
    767
    768
    769
    770
    771
    772
    773
    774
    775
    776
    777
    778
    779
    780
    781
    782
    783
    784
    785
    786
    787
    788
    789
    790
    791
    792
    793
    794
    795
    796
    797
    798
    799
    800
    801
    802
    803
    804
    805
    806
    807
    808
    809
    810
    811
    812
    813
    814
    815
    816
    817
    818
    819
    820
    821
    822
    823
    824
    825
    826
    827
    828
    829
    830
    831
    832
    833
    834
    835
    836
    837
    838
    839
    840
    841
    842
    843
    844
    845
    846
    847
    848
    849
    850
    851
    852
    853
    854
    855
    856
    857
    858
    859
    860
    861
    862
    863
    864
    865
    866
    867
    868
    869
    870
    871
    872
    873
    874
    875
    876
    877
    878
    879
    880
    881
    882
    883
    884
    885
    886
    887
    888
    889
    890
    891
    892
    893
    894
    895
    896
    897
    898
    899
    900
    901
    902
    903
    904
    905
    906
    907
  2. 添加权限

  3. 执行脚本

    dwd_to_dws.sh all 2020-06-14
    
    1

# 数仓搭建-DWT层

# 访客主题

  1. 建表语句

    DROP TABLE IF EXISTS dwt_visitor_topic;
    CREATE EXTERNAL TABLE dwt_visitor_topic
    (
        `mid_id` STRING COMMENT '设备id',
        `brand` STRING COMMENT '手机品牌',
        `model` STRING COMMENT '手机型号',
        `channel` ARRAY<STRING> COMMENT '渠道',
        `os` ARRAY<STRING> COMMENT '操作系统',
        `area_code` ARRAY<STRING> COMMENT '地区ID',
        `version_code` ARRAY<STRING> COMMENT '应用版本',
        `visit_date_first` STRING  COMMENT '首次访问时间',
        `visit_date_last` STRING  COMMENT '末次访问时间',
        `visit_last_1d_count` BIGINT COMMENT '最近1日访问次数',
        `visit_last_1d_day_count` BIGINT COMMENT '最近1日访问天数',
        `visit_last_7d_count` BIGINT COMMENT '最近7日访问次数',
        `visit_last_7d_day_count` BIGINT COMMENT '最近7日访问天数',
        `visit_last_30d_count` BIGINT COMMENT '最近30日访问次数',
        `visit_last_30d_day_count` BIGINT COMMENT '最近30日访问天数',
        `visit_count` BIGINT COMMENT '累积访问次数',
        `visit_day_count` BIGINT COMMENT '累积访问天数'
    ) COMMENT '设备主题宽表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwt/dwt_visitor_topic'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
  2. 数据装载

    insert overwrite table dwt_visitor_topic partition(dt='2020-06-14')
    select
        nvl(1d_ago.mid_id,old.mid_id),
        nvl(1d_ago.brand,old.brand),
        nvl(1d_ago.model,old.model),
        nvl(1d_ago.channel,old.channel),
        nvl(1d_ago.os,old.os),
        nvl(1d_ago.area_code,old.area_code),
        nvl(1d_ago.version_code,old.version_code),
        case when old.mid_id is null and 1d_ago.is_new=1 then '2020-06-14'
             when old.mid_id is null and 1d_ago.is_new=0 then '2020-06-13'--无法获取准确的首次登录日期,给定一个数仓搭建日之前的日期
             else old.visit_date_first end,
        if(1d_ago.mid_id is not null,'2020-06-14',old.visit_date_last),
        nvl(1d_ago.visit_count,0),
        if(1d_ago.mid_id is null,0,1),
        nvl(old.visit_last_7d_count,0)+nvl(1d_ago.visit_count,0)- nvl(7d_ago.visit_count,0),
        nvl(old.visit_last_7d_day_count,0)+if(1d_ago.mid_id is null,0,1)- if(7d_ago.mid_id is null,0,1),
        nvl(old.visit_last_30d_count,0)+nvl(1d_ago.visit_count,0)- nvl(30d_ago.visit_count,0),
        nvl(old.visit_last_30d_day_count,0)+if(1d_ago.mid_id is null,0,1)- if(30d_ago.mid_id is null,0,1),
        nvl(old.visit_count,0)+nvl(1d_ago.visit_count,0),
        nvl(old.visit_day_count,0)+if(1d_ago.mid_id is null,0,1)
    from
    (
        select
            mid_id,
            brand,
            model,
            channel,
            os,
            area_code,
            version_code,
            visit_date_first,
            visit_date_last,
            visit_last_1d_count,
            visit_last_1d_day_count,
            visit_last_7d_count,
            visit_last_7d_day_count,
            visit_last_30d_count,
            visit_last_30d_day_count,
            visit_count,
            visit_day_count
        from dwt_visitor_topic
        where dt=date_add('2020-06-14',-1)
    )old
    full outer join
    (
        select
            mid_id,
            brand,
            model,
            is_new,
            channel,
            os,
            area_code,
            version_code,
            visit_count
        from dws_visitor_action_daycount
        where dt='2020-06-14'
    )1d_ago
    on old.mid_id=1d_ago.mid_id
    left join
    (
        select
            mid_id,
            brand,
            model,
            is_new,
            channel,
            os,
            area_code,
            version_code,
            visit_count
        from dws_visitor_action_daycount
        where dt=date_add('2020-06-14',-7)
    )7d_ago
    on old.mid_id=7d_ago.mid_id
    left join
    (
        select
            mid_id,
            brand,
            model,
            is_new,
            channel,
            os,
            area_code,
            version_code,
            visit_count
        from dws_visitor_action_daycount
        where dt=date_add('2020-06-14',-30)
    )30d_ago
    on old.mid_id=30d_ago.mid_id;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92

# 用户主题

  1. 建表语句

    DROP TABLE IF EXISTS dwt_user_topic;
    CREATE EXTERNAL TABLE dwt_user_topic
    (
        `user_id` STRING  COMMENT '用户id',
        `login_date_first` STRING COMMENT '首次活跃日期',
        `login_date_last` STRING COMMENT '末次活跃日期',
        `login_date_1d_count` STRING COMMENT '最近1日登录次数',
        `login_last_1d_day_count` BIGINT COMMENT '最近1日登录天数',
        `login_last_7d_count` BIGINT COMMENT '最近7日登录次数',
        `login_last_7d_day_count` BIGINT COMMENT '最近7日登录天数',
        `login_last_30d_count` BIGINT COMMENT '最近30日登录次数',
        `login_last_30d_day_count` BIGINT COMMENT '最近30日登录天数',
        `login_count` BIGINT COMMENT '累积登录次数',
        `login_day_count` BIGINT COMMENT '累积登录天数',
        `order_date_first` STRING COMMENT '首次下单时间',
        `order_date_last` STRING COMMENT '末次下单时间',
        `order_last_1d_count` BIGINT COMMENT '最近1日下单次数',
        `order_activity_last_1d_count` BIGINT COMMENT '最近1日订单参与活动次数',
        `order_activity_reduce_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日订单减免金额(活动)',
        `order_coupon_last_1d_count` BIGINT COMMENT '最近1日下单用券次数',
        `order_coupon_reduce_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日订单减免金额(优惠券)',
        `order_last_1d_original_amount` DECIMAL(16,2) COMMENT '最近1日原始下单金额',
        `order_last_1d_final_amount` DECIMAL(16,2) COMMENT '最近1日最终下单金额',
        `order_last_7d_count` BIGINT COMMENT '最近7日下单次数',
        `order_activity_last_7d_count` BIGINT COMMENT '最近7日订单参与活动次数',
        `order_activity_reduce_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日订单减免金额(活动)',
        `order_coupon_last_7d_count` BIGINT COMMENT '最近7日下单用券次数',
        `order_coupon_reduce_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日订单减免金额(优惠券)',
        `order_last_7d_original_amount` DECIMAL(16,2) COMMENT '最近7日原始下单金额',
        `order_last_7d_final_amount` DECIMAL(16,2) COMMENT '最近7日最终下单金额',
        `order_last_30d_count` BIGINT COMMENT '最近30日下单次数',
        `order_activity_last_30d_count` BIGINT COMMENT '最近30日订单参与活动次数',
        `order_activity_reduce_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日订单减免金额(活动)',
        `order_coupon_last_30d_count` BIGINT COMMENT '最近30日下单用券次数',
        `order_coupon_reduce_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日订单减免金额(优惠券)',
        `order_last_30d_original_amount` DECIMAL(16,2) COMMENT '最近30日原始下单金额',
        `order_last_30d_final_amount` DECIMAL(16,2) COMMENT '最近30日最终下单金额',
        `order_count` BIGINT COMMENT '累积下单次数',
        `order_activity_count` BIGINT COMMENT '累积订单参与活动次数',
        `order_activity_reduce_amount` DECIMAL(16,2) COMMENT '累积订单减免金额(活动)',
        `order_coupon_count` BIGINT COMMENT '累积下单用券次数',
        `order_coupon_reduce_amount` DECIMAL(16,2) COMMENT '累积订单减免金额(优惠券)',
        `order_original_amount` DECIMAL(16,2) COMMENT '累积原始下单金额',
        `order_final_amount` DECIMAL(16,2) COMMENT '累积最终下单金额',
        `payment_date_first` STRING COMMENT '首次支付时间',
        `payment_date_last` STRING COMMENT '末次支付时间',
        `payment_last_1d_count` BIGINT COMMENT '最近1日支付次数',
        `payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日支付金额',
        `payment_last_7d_count` BIGINT COMMENT '最近7日支付次数',
        `payment_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日支付金额',
        `payment_last_30d_count` BIGINT COMMENT '最近30日支付次数',
        `payment_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日支付金额',
        `payment_count` BIGINT COMMENT '累积支付次数',
        `payment_amount` DECIMAL(16,2) COMMENT '累积支付金额',
        `refund_order_last_1d_count` BIGINT COMMENT '最近1日退单次数',
        `refund_order_last_1d_num` BIGINT COMMENT '最近1日退单件数',
        `refund_order_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日退单金额',
        `refund_order_last_7d_count` BIGINT COMMENT '最近7日退单次数',
        `refund_order_last_7d_num` BIGINT COMMENT '最近7日退单件数',
        `refund_order_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日退单金额',
        `refund_order_last_30d_count` BIGINT COMMENT '最近30日退单次数',
        `refund_order_last_30d_num` BIGINT COMMENT '最近30日退单件数',
        `refund_order_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日退单金额',
        `refund_order_count` BIGINT COMMENT '累积退单次数',
        `refund_order_num` BIGINT COMMENT '累积退单件数',
        `refund_order_amount` DECIMAL(16,2) COMMENT '累积退单金额',
        `refund_payment_last_1d_count` BIGINT COMMENT '最近1日退款次数',
        `refund_payment_last_1d_num` BIGINT COMMENT '最近1日退款件数',
        `refund_payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日退款金额',
        `refund_payment_last_7d_count` BIGINT COMMENT '最近7日退款次数',
        `refund_payment_last_7d_num` BIGINT COMMENT '最近7日退款件数',
        `refund_payment_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日退款金额',
        `refund_payment_last_30d_count` BIGINT COMMENT '最近30日退款次数',
        `refund_payment_last_30d_num` BIGINT COMMENT '最近30日退款件数',
        `refund_payment_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日退款金额',
        `refund_payment_count` BIGINT COMMENT '累积退款次数',
        `refund_payment_num` BIGINT COMMENT '累积退款件数',
        `refund_payment_amount` DECIMAL(16,2) COMMENT '累积退款金额',
        `cart_last_1d_count` BIGINT COMMENT '最近1日加入购物车次数',
        `cart_last_7d_count` BIGINT COMMENT '最近7日加入购物车次数',
        `cart_last_30d_count` BIGINT COMMENT '最近30日加入购物车次数',
        `cart_count` BIGINT COMMENT '累积加入购物车次数',
        `favor_last_1d_count` BIGINT COMMENT '最近1日收藏次数',
        `favor_last_7d_count` BIGINT COMMENT '最近7日收藏次数',
        `favor_last_30d_count` BIGINT COMMENT '最近30日收藏次数',
        `favor_count` BIGINT COMMENT '累积收藏次数',
        `coupon_last_1d_get_count` BIGINT COMMENT '最近1日领券次数',
        `coupon_last_1d_using_count` BIGINT COMMENT '最近1日用券(下单)次数',
        `coupon_last_1d_used_count` BIGINT COMMENT '最近1日用券(支付)次数',
        `coupon_last_7d_get_count` BIGINT COMMENT '最近7日领券次数',
        `coupon_last_7d_using_count` BIGINT COMMENT '最近7日用券(下单)次数',
        `coupon_last_7d_used_count` BIGINT COMMENT '最近7日用券(支付)次数',
        `coupon_last_30d_get_count` BIGINT COMMENT '最近30日领券次数',
        `coupon_last_30d_using_count` BIGINT COMMENT '最近30日用券(下单)次数',
        `coupon_last_30d_used_count` BIGINT COMMENT '最近30日用券(支付)次数',
        `coupon_get_count` BIGINT COMMENT '累积领券次数',
        `coupon_using_count` BIGINT COMMENT '累积用券(下单)次数',
        `coupon_used_count` BIGINT COMMENT '累积用券(支付)次数',
        `appraise_last_1d_good_count` BIGINT COMMENT '最近1日好评次数',
        `appraise_last_1d_mid_count` BIGINT COMMENT '最近1日中评次数',
        `appraise_last_1d_bad_count` BIGINT COMMENT '最近1日差评次数',
        `appraise_last_1d_default_count` BIGINT COMMENT '最近1日默认评价次数',
        `appraise_last_7d_good_count` BIGINT COMMENT '最近7日好评次数',
        `appraise_last_7d_mid_count` BIGINT COMMENT '最近7日中评次数',
        `appraise_last_7d_bad_count` BIGINT COMMENT '最近7日差评次数',
        `appraise_last_7d_default_count` BIGINT COMMENT '最近7日默认评价次数',
        `appraise_last_30d_good_count` BIGINT COMMENT '最近30日好评次数',
        `appraise_last_30d_mid_count` BIGINT COMMENT '最近30日中评次数',
        `appraise_last_30d_bad_count` BIGINT COMMENT '最近30日差评次数',
        `appraise_last_30d_default_count` BIGINT COMMENT '最近30日默认评价次数',
        `appraise_good_count` BIGINT COMMENT '累积好评次数',
        `appraise_mid_count` BIGINT COMMENT '累积中评次数',
        `appraise_bad_count` BIGINT COMMENT '累积差评次数',
        `appraise_default_count` BIGINT COMMENT '累积默认评价次数'
    )COMMENT '会员主题宽表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwt/dwt_user_topic/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
  2. 加载数据

    1. 首日装载

      insert overwrite table dwt_user_topic partition(dt='2020-06-14')
      select
          id,
          login_date_first,--以用户的创建日期作为首次登录日期
          nvl(login_date_last,date_add('2020-06-14',-1)),--若有历史登录记录,则根据历史记录获取末次登录日期,否则统一指定一个日期
          nvl(login_last_1d_count,0),
          nvl(login_last_1d_day_count,0),
          nvl(login_last_7d_count,0),
          nvl(login_last_7d_day_count,0),
          nvl(login_last_30d_count,0),
          nvl(login_last_30d_day_count,0),
          nvl(login_count,0),
          nvl(login_day_count,0),
          order_date_first,
          order_date_last,
          nvl(order_last_1d_count,0),
          nvl(order_activity_last_1d_count,0),
          nvl(order_activity_reduce_last_1d_amount,0),
          nvl(order_coupon_last_1d_count,0),
          nvl(order_coupon_reduce_last_1d_amount,0),
          nvl(order_last_1d_original_amount,0),
          nvl(order_last_1d_final_amount,0),
          nvl(order_last_7d_count,0),
          nvl(order_activity_last_7d_count,0),
          nvl(order_activity_reduce_last_7d_amount,0),
          nvl(order_coupon_last_7d_count,0),
          nvl(order_coupon_reduce_last_7d_amount,0),
          nvl(order_last_7d_original_amount,0),
          nvl(order_last_7d_final_amount,0),
          nvl(order_last_30d_count,0),
          nvl(order_activity_last_30d_count,0),
          nvl(order_activity_reduce_last_30d_amount,0),
          nvl(order_coupon_last_30d_count,0),
          nvl(order_coupon_reduce_last_30d_amount,0),
          nvl(order_last_30d_original_amount,0),
          nvl(order_last_30d_final_amount,0),
          nvl(order_count,0),
          nvl(order_activity_count,0),
          nvl(order_activity_reduce_amount,0),
          nvl(order_coupon_count,0),
          nvl(order_coupon_reduce_amount,0),
          nvl(order_original_amount,0),
          nvl(order_final_amount,0),
          payment_date_first,
          payment_date_last,
          nvl(payment_last_1d_count,0),
          nvl(payment_last_1d_amount,0),
          nvl(payment_last_7d_count,0),
          nvl(payment_last_7d_amount,0),
          nvl(payment_last_30d_count,0),
          nvl(payment_last_30d_amount,0),
          nvl(payment_count,0),
          nvl(payment_amount,0),
          nvl(refund_order_last_1d_count,0),
          nvl(refund_order_last_1d_num,0),
          nvl(refund_order_last_1d_amount,0),
          nvl(refund_order_last_7d_count,0),
          nvl(refund_order_last_7d_num,0),
          nvl(refund_order_last_7d_amount,0),
          nvl(refund_order_last_30d_count,0),
          nvl(refund_order_last_30d_num,0),
          nvl(refund_order_last_30d_amount,0),
          nvl(refund_order_count,0),
          nvl(refund_order_num,0),
          nvl(refund_order_amount,0),
          nvl(refund_payment_last_1d_count,0),
          nvl(refund_payment_last_1d_num,0),
          nvl(refund_payment_last_1d_amount,0),
          nvl(refund_payment_last_7d_count,0),
          nvl(refund_payment_last_7d_num,0),
          nvl(refund_payment_last_7d_amount,0),
          nvl(refund_payment_last_30d_count,0),
          nvl(refund_payment_last_30d_num,0),
          nvl(refund_payment_last_30d_amount,0),
          nvl(refund_payment_count,0),
          nvl(refund_payment_num,0),
          nvl(refund_payment_amount,0),
          nvl(cart_last_1d_count,0),
          nvl(cart_last_7d_count,0),
          nvl(cart_last_30d_count,0),
          nvl(cart_count,0),
          nvl(favor_last_1d_count,0),
          nvl(favor_last_7d_count,0),
          nvl(favor_last_30d_count,0),
          nvl(favor_count,0),
          nvl(coupon_last_1d_get_count,0),
          nvl(coupon_last_1d_using_count,0),
          nvl(coupon_last_1d_used_count,0),
          nvl(coupon_last_7d_get_count,0),
          nvl(coupon_last_7d_using_count,0),
          nvl(coupon_last_7d_used_count,0),
          nvl(coupon_last_30d_get_count,0),
          nvl(coupon_last_30d_using_count,0),
          nvl(coupon_last_30d_used_count,0),
          nvl(coupon_get_count,0),
          nvl(coupon_using_count,0),
          nvl(coupon_used_count,0),
          nvl(appraise_last_1d_good_count,0),
          nvl(appraise_last_1d_mid_count,0),
          nvl(appraise_last_1d_bad_count,0),
          nvl(appraise_last_1d_default_count,0),
          nvl(appraise_last_7d_good_count,0),
          nvl(appraise_last_7d_mid_count,0),
          nvl(appraise_last_7d_bad_count,0),
          nvl(appraise_last_7d_default_count,0),
          nvl(appraise_last_30d_good_count,0),
          nvl(appraise_last_30d_mid_count,0),
          nvl(appraise_last_30d_bad_count,0),
          nvl(appraise_last_30d_default_count,0),
          nvl(appraise_good_count,0),
          nvl(appraise_mid_count,0),
          nvl(appraise_bad_count,0),
          nvl(appraise_default_count,0)
      from
      (
          select
              id,
              date_format(create_time,'yyyy-MM-dd') login_date_first
          from dim_user_info
          where dt='9999-99-99'
      )t1
      left join
      (
          select
              user_id user_id,
              max(dt) login_date_last,
              sum(if(dt='2020-06-14',login_count,0)) login_last_1d_count,
              sum(if(dt='2020-06-14' and login_count>0,1,0)) login_last_1d_day_count,
              sum(if(dt>=date_add('2020-06-14',-6),login_count,0)) login_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6) and login_count>0,1,0)) login_last_7d_day_count,
              sum(if(dt>=date_add('2020-06-14',-29),login_count,0)) login_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29) and login_count>0,1,0)) login_last_30d_day_count,
              sum(login_count) login_count,
              sum(if(login_count>0,1,0)) login_day_count,
              min(if(order_count>0,dt,null)) order_date_first,
              max(if(order_count>0,dt,null)) order_date_last,
              sum(if(dt='2020-06-14',order_count,0)) order_last_1d_count,
              sum(if(dt='2020-06-14',order_activity_count,0)) order_activity_last_1d_count,
              sum(if(dt='2020-06-14',order_activity_reduce_amount,0)) order_activity_reduce_last_1d_amount,
              sum(if(dt='2020-06-14',order_coupon_count,0)) order_coupon_last_1d_count,
              sum(if(dt='2020-06-14',order_coupon_reduce_amount,0)) order_coupon_reduce_last_1d_amount,
              sum(if(dt='2020-06-14',order_original_amount,0)) order_last_1d_original_amount,
              sum(if(dt='2020-06-14',order_final_amount,0)) order_last_1d_final_amount,
              sum(if(dt>=date_add('2020-06-14',-6),order_count,0)) order_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),order_activity_count,0)) order_activity_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),order_activity_reduce_amount,0)) order_activity_reduce_last_7d_amount,
              sum(if(dt>=date_add('2020-06-14',-6),order_coupon_count,0)) order_coupon_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),order_coupon_reduce_amount,0)) order_coupon_reduce_last_7d_amount,
              sum(if(dt>=date_add('2020-06-14',-6),order_original_amount,0)) order_last_7d_original_amount,
              sum(if(dt>=date_add('2020-06-14',-6),order_final_amount,0)) order_last_7d_final_amount,
              sum(if(dt>=date_add('2020-06-14',-29),order_count,0)) order_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),order_activity_count,0)) order_activity_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),order_activity_reduce_amount,0)) order_activity_reduce_last_30d_amount,
              sum(if(dt>=date_add('2020-06-14',-29),order_coupon_count,0)) order_coupon_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),order_coupon_reduce_amount,0)) order_coupon_reduce_last_30d_amount,
              sum(if(dt>=date_add('2020-06-14',-29),order_original_amount,0)) order_last_30d_original_amount,
              sum(if(dt>=date_add('2020-06-14',-29),order_final_amount,0)) order_last_30d_final_amount,
              sum(order_count) order_count,
              sum(order_activity_count) order_activity_count,
              sum(order_activity_reduce_amount) order_activity_reduce_amount,
              sum(order_coupon_count) order_coupon_count,
              sum(order_coupon_reduce_amount) order_coupon_reduce_amount,
              sum(order_original_amount) order_original_amount,
              sum(order_final_amount) order_final_amount,
              min(if(payment_count>0,dt,null)) payment_date_first,
              max(if(payment_count>0,dt,null)) payment_date_last,
              sum(if(dt='2020-06-14',payment_count,0)) payment_last_1d_count,
              sum(if(dt='2020-06-14',payment_amount,0)) payment_last_1d_amount,
              sum(if(dt>=date_add('2020-06-14',-6),payment_count,0)) payment_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),payment_amount,0)) payment_last_7d_amount,
              sum(if(dt>=date_add('2020-06-14',-29),payment_count,0)) payment_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),payment_amount,0)) payment_last_30d_amount,
              sum(payment_count) payment_count,
              sum(payment_amount) payment_amount,
              sum(if(dt='2020-06-14',refund_order_count,0)) refund_order_last_1d_count,
              sum(if(dt='2020-06-14',refund_order_num,0)) refund_order_last_1d_num,
              sum(if(dt='2020-06-14',refund_order_amount,0)) refund_order_last_1d_amount,
              sum(if(dt>=date_add('2020-06-14',-6),refund_order_count,0)) refund_order_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),refund_order_num,0)) refund_order_last_7d_num,
              sum(if(dt>=date_add('2020-06-14',-6),refund_order_amount,0)) refund_order_last_7d_amount,
              sum(if(dt>=date_add('2020-06-14',-29),refund_order_count,0)) refund_order_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),refund_order_num,0)) refund_order_last_30d_num,
              sum(if(dt>=date_add('2020-06-14',-29),refund_order_amount,0)) refund_order_last_30d_amount,
              sum(refund_order_count) refund_order_count,
              sum(refund_order_num) refund_order_num,
              sum(refund_order_amount) refund_order_amount,
              sum(if(dt='2020-06-14',refund_payment_count,0)) refund_payment_last_1d_count,
              sum(if(dt='2020-06-14',refund_payment_num,0)) refund_payment_last_1d_num,
              sum(if(dt='2020-06-14',refund_payment_amount,0)) refund_payment_last_1d_amount,
              sum(if(dt>=date_add('2020-06-14',-6),refund_payment_count,0)) refund_payment_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),refund_payment_num,0)) refund_payment_last_7d_num,
              sum(if(dt>=date_add('2020-06-14',-6),refund_payment_amount,0)) refund_payment_last_7d_amount,
              sum(if(dt>=date_add('2020-06-14',-29),refund_payment_count,0)) refund_payment_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),refund_payment_num,0)) refund_payment_last_30d_num,
              sum(if(dt>=date_add('2020-06-14',-29),refund_payment_amount,0)) refund_payment_last_30d_amount,
              sum(refund_payment_count) refund_payment_count,
              sum(refund_payment_num) refund_payment_num,
              sum(refund_payment_amount) refund_payment_amount,
              sum(if(dt='2020-06-14',cart_count,0)) cart_last_1d_count,
              sum(if(dt>=date_add('2020-06-14',-6),cart_count,0)) cart_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-29),cart_count,0)) cart_last_30d_count,
              sum(cart_count) cart_count,
              sum(if(dt='2020-06-14',favor_count,0)) favor_last_1d_count,
              sum(if(dt>=date_add('2020-06-14',-6),favor_count,0)) favor_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-29),favor_count,0)) favor_last_30d_count,
              sum(favor_count) favor_count,
              sum(if(dt='2020-06-14',coupon_get_count,0)) coupon_last_1d_get_count,
              sum(if(dt='2020-06-14',coupon_using_count,0)) coupon_last_1d_using_count,
              sum(if(dt='2020-06-14',coupon_used_count,0)) coupon_last_1d_used_count,
              sum(if(dt>=date_add('2020-06-14',-6),coupon_get_count,0)) coupon_last_7d_get_count,
              sum(if(dt>=date_add('2020-06-14',-6),coupon_using_count,0)) coupon_last_7d_using_count,
              sum(if(dt>=date_add('2020-06-14',-6),coupon_used_count,0)) coupon_last_7d_used_count,
              sum(if(dt>=date_add('2020-06-14',-29),coupon_get_count,0)) coupon_last_30d_get_count,
              sum(if(dt>=date_add('2020-06-14',-29),coupon_using_count,0)) coupon_last_30d_using_count,
              sum(if(dt>=date_add('2020-06-14',-29),coupon_used_count,0)) coupon_last_30d_used_count,
              sum(coupon_get_count) coupon_get_count,
              sum(coupon_using_count) coupon_using_count,
              sum(coupon_used_count) coupon_used_count,
              sum(if(dt='2020-06-14',appraise_good_count,0)) appraise_last_1d_good_count,
              sum(if(dt='2020-06-14',appraise_mid_count,0)) appraise_last_1d_mid_count,
              sum(if(dt='2020-06-14',appraise_bad_count,0)) appraise_last_1d_bad_count,
              sum(if(dt='2020-06-14',appraise_default_count,0)) appraise_last_1d_default_count,
              sum(if(dt>=date_add('2020-06-14',-6),appraise_good_count,0)) appraise_last_7d_good_count,
              sum(if(dt>=date_add('2020-06-14',-6),appraise_mid_count,0)) appraise_last_7d_mid_count,
              sum(if(dt>=date_add('2020-06-14',-6),appraise_bad_count,0)) appraise_last_7d_bad_count,
              sum(if(dt>=date_add('2020-06-14',-6),appraise_default_count,0)) appraise_last_7d_default_count,
              sum(if(dt>=date_add('2020-06-14',-29),appraise_good_count,0)) appraise_last_30d_good_count,
              sum(if(dt>=date_add('2020-06-14',-29),appraise_mid_count,0)) appraise_last_30d_mid_count,
              sum(if(dt>=date_add('2020-06-14',-29),appraise_bad_count,0)) appraise_last_30d_bad_count,
              sum(if(dt>=date_add('2020-06-14',-29),appraise_default_count,0)) appraise_last_30d_default_count,
              sum(appraise_good_count) appraise_good_count,
              sum(appraise_mid_count) appraise_mid_count,
              sum(appraise_bad_count) appraise_bad_count,
              sum(appraise_default_count) appraise_default_count
          from dws_user_action_daycount
          group by user_id
      )t2
      on t1.id=t2.user_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133
      134
      135
      136
      137
      138
      139
      140
      141
      142
      143
      144
      145
      146
      147
      148
      149
      150
      151
      152
      153
      154
      155
      156
      157
      158
      159
      160
      161
      162
      163
      164
      165
      166
      167
      168
      169
      170
      171
      172
      173
      174
      175
      176
      177
      178
      179
      180
      181
      182
      183
      184
      185
      186
      187
      188
      189
      190
      191
      192
      193
      194
      195
      196
      197
      198
      199
      200
      201
      202
      203
      204
      205
      206
      207
      208
      209
      210
      211
      212
      213
      214
      215
      216
      217
      218
      219
      220
      221
      222
      223
      224
      225
      226
      227
      228
      229
      230
      231
      232
      233
      234
      235
      236
      237
      238
    2. 每日装载

      insert overwrite table dwt_user_topic partition(dt='2020-06-15')
      select
          nvl(1d_ago.user_id,old.user_id),
          nvl(old.login_date_first,'2020-06-15'),
          if(1d_ago.user_id is not null,'2020-06-15',old.login_date_last),
          nvl(1d_ago.login_count,0),
          if(1d_ago.user_id is not null,1,0),
          nvl(old.login_last_7d_count,0)+nvl(1d_ago.login_count,0)- nvl(7d_ago.login_count,0),
          nvl(old.login_last_7d_day_count,0)+if(1d_ago.user_id is null,0,1)- if(7d_ago.user_id is null,0,1),
          nvl(old.login_last_30d_count,0)+nvl(1d_ago.login_count,0)- nvl(30d_ago.login_count,0),
          nvl(old.login_last_30d_day_count,0)+if(1d_ago.user_id is null,0,1)- if(30d_ago.user_id is null,0,1),
          nvl(old.login_count,0)+nvl(1d_ago.login_count,0),
          nvl(old.login_day_count,0)+if(1d_ago.user_id is not null,1,0),
          if(old.order_date_first is null and 1d_ago.order_count>0, '2020-06-15', old.order_date_first),
          if(1d_ago.order_count>0,'2020-06-15',old.order_date_last),
          nvl(1d_ago.order_count,0),
          nvl(1d_ago.order_activity_count,0),
          nvl(1d_ago.order_activity_reduce_amount,0.0),
          nvl(1d_ago.order_coupon_count,0),
          nvl(1d_ago.order_coupon_reduce_amount,0.0),
          nvl(1d_ago.order_original_amount,0.0),
          nvl(1d_ago.order_final_amount,0.0),
          nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0),
          nvl(old.order_activity_last_7d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(7d_ago.order_activity_count,0),
          nvl(old.order_activity_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(7d_ago.order_activity_reduce_amount,0.0),
          nvl(old.order_coupon_last_7d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(7d_ago.order_coupon_count,0),
          nvl(old.order_coupon_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(7d_ago.order_coupon_reduce_amount,0.0),
          nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0),
          nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0),
          nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0),
          nvl(old.order_activity_last_30d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(30d_ago.order_activity_count,0),
          nvl(old.order_activity_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(30d_ago.order_activity_reduce_amount,0.0),
          nvl(old.order_coupon_last_30d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(30d_ago.order_coupon_count,0),
          nvl(old.order_coupon_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(30d_ago.order_coupon_reduce_amount,0.0),
          nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0),
          nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0),
          nvl(old.order_count,0)+nvl(1d_ago.order_count,0),
          nvl(old.order_activity_count,0)+nvl(1d_ago.order_activity_count,0),
          nvl(old.order_activity_reduce_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0),
          nvl(old.order_coupon_count,0)+nvl(1d_ago.order_coupon_count,0),
          nvl(old.order_coupon_reduce_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0),
          nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0),
          nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0),
          if(old.payment_date_first is null and 1d_ago.payment_count>0, '2020-06-15', old.payment_date_first),
          if(1d_ago.payment_count>0,'2020-06-15',old.payment_date_last),
          nvl(1d_ago.payment_count,0),
          nvl(1d_ago.payment_amount,0.0),
          nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)-nvl(7d_ago.payment_count,0),
          nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)-nvl(7d_ago.payment_amount,0.0),
          nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)-nvl(30d_ago.payment_count,0),
          nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0),
          nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0),
          nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0),
          nvl(1d_ago.refund_order_count,0),
          nvl(1d_ago.refund_order_num,0),
          nvl(1d_ago.refund_order_amount,0.0),
          nvl(old.refund_order_last_7d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(7d_ago.refund_order_count,0),
          nvl(old.refund_order_last_7d_num,0)+nvl(1d_ago.refund_order_num, 0)- nvl(7d_ago.refund_order_num,0),
          nvl(old.refund_order_last_7d_amount,0.0)+ nvl(1d_ago.refund_order_amount,0.0)- nvl(7d_ago.refund_order_amount,0.0),
          nvl(old.refund_order_last_30d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(30d_ago.refund_order_count,0),
          nvl(old.refund_order_last_30d_num,0)+nvl(1d_ago.refund_order_num, 0)- nvl(30d_ago.refund_order_num,0),
          nvl(old.refund_order_last_30d_amount,0.0)+ nvl(1d_ago.refund_order_amount,0.0)- nvl(30d_ago.refund_order_amount,0.0),
          nvl(old.refund_order_count,0)+nvl(1d_ago.refund_order_count,0),
          nvl(old.refund_order_num,0)+nvl(1d_ago.refund_order_num,0),
          nvl(old.refund_order_amount,0.0)+ nvl(1d_ago.refund_order_amount,0.0),
          nvl(1d_ago.refund_payment_count,0),
          nvl(1d_ago.refund_payment_num,0),
          nvl(1d_ago.refund_payment_amount,0.0),
          nvl(old.refund_payment_last_7d_count,0)+nvl(1d_ago.refund_payment_count,0)-nvl(7d_ago.refund_payment_count,0),
          nvl(old.refund_payment_last_7d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(7d_ago.refund_payment_num,0),
          nvl(old.refund_payment_last_7d_amount,0.0)+ nvl(1d_ago.refund_payment_amount,0.0)- nvl(7d_ago.refund_payment_amount,0.0),
          nvl(old.refund_payment_last_30d_count,0)+nvl(1d_ago.refund_payment_count,0)-nvl(30d_ago.refund_payment_count,0),
          nvl(old.refund_payment_last_30d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(30d_ago.refund_payment_num,0),
          nvl(old.refund_payment_last_30d_amount,0.0)+ nvl(1d_ago.refund_payment_amount,0.0)- nvl(30d_ago.refund_payment_amount,0.0),
          nvl(old.refund_payment_count,0)+nvl(1d_ago.refund_payment_count,0),
          nvl(old.refund_payment_num,0)+nvl(1d_ago.refund_payment_num,0),
          nvl(old.refund_payment_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0),
          nvl(1d_ago.cart_count,0),
          nvl(old.cart_last_7d_count,0)+nvl(1d_ago.cart_count,0)-nvl(7d_ago.cart_count,0),
          nvl(old.cart_last_30d_count,0)+nvl(1d_ago.cart_count,0)-nvl(30d_ago.cart_count,0),
          nvl(old.cart_count,0)+nvl(1d_ago.cart_count,0),
          nvl(1d_ago.favor_count,0),
          nvl(old.favor_last_7d_count,0)+nvl(1d_ago.favor_count,0)- nvl(7d_ago.favor_count,0),
          nvl(old.favor_last_30d_count,0)+nvl(1d_ago.favor_count,0)- nvl(30d_ago.favor_count,0),
          nvl(old.favor_count,0)+nvl(1d_ago.favor_count,0),
          nvl(1d_ago.coupon_get_count,0),
          nvl(1d_ago.coupon_using_count,0),
          nvl(1d_ago.coupon_used_count,0),
          nvl(old.coupon_last_7d_get_count,0)+nvl(1d_ago.coupon_get_count,0)- nvl(7d_ago.coupon_get_count,0),
          nvl(old.coupon_last_7d_using_count,0)+nvl(1d_ago.coupon_using_count,0)- nvl(7d_ago.coupon_using_count,0),
          nvl(old.coupon_last_7d_used_count,0)+ nvl(1d_ago.coupon_used_count,0)- nvl(7d_ago.coupon_used_count,0),
          nvl(old.coupon_last_30d_get_count,0)+nvl(1d_ago.coupon_get_count,0)- nvl(30d_ago.coupon_get_count,0),
          nvl(old.coupon_last_30d_using_count,0)+nvl(1d_ago.coupon_using_count,0)- nvl(30d_ago.coupon_using_count,0),
          nvl(old.coupon_last_30d_used_count,0)+ nvl(1d_ago.coupon_used_count,0)- nvl(30d_ago.coupon_used_count,0),
          nvl(old.coupon_get_count,0)+nvl(1d_ago.coupon_get_count,0),
          nvl(old.coupon_using_count,0)+nvl(1d_ago.coupon_using_count,0),
          nvl(old.coupon_used_count,0)+nvl(1d_ago.coupon_used_count,0),
          nvl(1d_ago.appraise_good_count,0),
          nvl(1d_ago.appraise_mid_count,0),
          nvl(1d_ago.appraise_bad_count,0),
          nvl(1d_ago.appraise_default_count,0),
          nvl(old.appraise_last_7d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(7d_ago.appraise_good_count,0),
          nvl(old.appraise_last_7d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)-nvl(7d_ago.appraise_mid_count,0),
          nvl(old.appraise_last_7d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)-nvl(7d_ago.appraise_bad_count,0),
          nvl(old.appraise_last_7d_default_count,0)+nvl(1d_ago.appraise_default_count,0)-nvl(7d_ago.appraise_default_count,0),
          nvl(old.appraise_last_30d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(30d_ago.appraise_good_count,0),
          nvl(old.appraise_last_30d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)-nvl(30d_ago.appraise_mid_count,0),
          nvl(old.appraise_last_30d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)-nvl(30d_ago.appraise_bad_count,0),
          nvl(old.appraise_last_30d_default_count,0)+nvl(1d_ago.appraise_default_count,0)-nvl(30d_ago.appraise_default_count,0),
          nvl(old.appraise_good_count,0)+nvl(1d_ago.appraise_good_count,0),
          nvl(old.appraise_mid_count,0)+nvl(1d_ago.appraise_mid_count, 0),
          nvl(old.appraise_bad_count,0)+nvl(1d_ago.appraise_bad_count,0),
          nvl(old.appraise_default_count,0)+nvl(1d_ago.appraise_default_count,0)
      from
      (
          select
              user_id,
              login_date_first,
              login_date_last,
              login_date_1d_count,
              login_last_1d_day_count,
              login_last_7d_count,
              login_last_7d_day_count,
              login_last_30d_count,
              login_last_30d_day_count,
              login_count,
              login_day_count,
              order_date_first,
              order_date_last,
              order_last_1d_count,
              order_activity_last_1d_count,
              order_activity_reduce_last_1d_amount,
              order_coupon_last_1d_count,
              order_coupon_reduce_last_1d_amount,
              order_last_1d_original_amount,
              order_last_1d_final_amount,
              order_last_7d_count,
              order_activity_last_7d_count,
              order_activity_reduce_last_7d_amount,
              order_coupon_last_7d_count,
              order_coupon_reduce_last_7d_amount,
              order_last_7d_original_amount,
              order_last_7d_final_amount,
              order_last_30d_count,
              order_activity_last_30d_count,
              order_activity_reduce_last_30d_amount,
              order_coupon_last_30d_count,
              order_coupon_reduce_last_30d_amount,
              order_last_30d_original_amount,
              order_last_30d_final_amount,
              order_count,
              order_activity_count,
              order_activity_reduce_amount,
              order_coupon_count,
              order_coupon_reduce_amount,
              order_original_amount,
              order_final_amount,
              payment_date_first,
              payment_date_last,
              payment_last_1d_count,
              payment_last_1d_amount,
              payment_last_7d_count,
              payment_last_7d_amount,
              payment_last_30d_count,
              payment_last_30d_amount,
              payment_count,
              payment_amount,
              refund_order_last_1d_count,
              refund_order_last_1d_num,
              refund_order_last_1d_amount,
              refund_order_last_7d_count,
              refund_order_last_7d_num,
              refund_order_last_7d_amount,
              refund_order_last_30d_count,
              refund_order_last_30d_num,
              refund_order_last_30d_amount,
              refund_order_count,
              refund_order_num,
              refund_order_amount,
              refund_payment_last_1d_count,
              refund_payment_last_1d_num,
              refund_payment_last_1d_amount,
              refund_payment_last_7d_count,
              refund_payment_last_7d_num,
              refund_payment_last_7d_amount,
              refund_payment_last_30d_count,
              refund_payment_last_30d_num,
              refund_payment_last_30d_amount,
              refund_payment_count,
              refund_payment_num,
              refund_payment_amount,
              cart_last_1d_count,
              cart_last_7d_count,
              cart_last_30d_count,
              cart_count,
              favor_last_1d_count,
              favor_last_7d_count,
              favor_last_30d_count,
              favor_count,
              coupon_last_1d_get_count,
              coupon_last_1d_using_count,
              coupon_last_1d_used_count,
              coupon_last_7d_get_count,
              coupon_last_7d_using_count,
              coupon_last_7d_used_count,
              coupon_last_30d_get_count,
              coupon_last_30d_using_count,
              coupon_last_30d_used_count,
              coupon_get_count,
              coupon_using_count,
              coupon_used_count,
              appraise_last_1d_good_count,
              appraise_last_1d_mid_count,
              appraise_last_1d_bad_count,
              appraise_last_1d_default_count,
              appraise_last_7d_good_count,
              appraise_last_7d_mid_count,
              appraise_last_7d_bad_count,
              appraise_last_7d_default_count,
              appraise_last_30d_good_count,
              appraise_last_30d_mid_count,
              appraise_last_30d_bad_count,
              appraise_last_30d_default_count,
              appraise_good_count,
              appraise_mid_count,
              appraise_bad_count,
              appraise_default_count
          from dwt_user_topic
          where dt=date_add('2020-06-15',-1)
      )old
      full outer join
      (
          select
              user_id,
              login_count,
              cart_count,
              favor_count,
              order_count,
              order_activity_count,
              order_activity_reduce_amount,
              order_coupon_count,
              order_coupon_reduce_amount,
              order_original_amount,
              order_final_amount,
              payment_count,
              payment_amount,
              refund_order_count,
              refund_order_num,
              refund_order_amount,
              refund_payment_count,
              refund_payment_num,
              refund_payment_amount,
              coupon_get_count,
              coupon_using_count,
              coupon_used_count,
              appraise_good_count,
              appraise_mid_count,
              appraise_bad_count,
              appraise_default_count
          from dws_user_action_daycount
          where dt='2020-06-15'
      )1d_ago
      on old.user_id=1d_ago.user_id
      left join
      (
          select
              user_id,
              login_count,
              cart_count,
              favor_count,
              order_count,
              order_activity_count,
              order_activity_reduce_amount,
              order_coupon_count,
              order_coupon_reduce_amount,
              order_original_amount,
              order_final_amount,
              payment_count,
              payment_amount,
              refund_order_count,
              refund_order_num,
              refund_order_amount,
              refund_payment_count,
              refund_payment_num,
              refund_payment_amount,
              coupon_get_count,
              coupon_using_count,
              coupon_used_count,
              appraise_good_count,
              appraise_mid_count,
              appraise_bad_count,
              appraise_default_count
          from dws_user_action_daycount
          where dt=date_add('2020-06-15',-7)
      )7d_ago
      on old.user_id=7d_ago.user_id
      left join
      (
          select
              user_id,
              login_count,
              cart_count,
              favor_count,
              order_count,
              order_activity_count,
              order_activity_reduce_amount,
              order_coupon_count,
              order_coupon_reduce_amount,
              order_original_amount,
              order_final_amount,
              payment_count,
              payment_amount,
              refund_order_count,
              refund_order_num,
              refund_order_amount,
              refund_payment_count,
              refund_payment_num,
              refund_payment_amount,
              coupon_get_count,
              coupon_using_count,
              coupon_used_count,
              appraise_good_count,
              appraise_mid_count,
              appraise_bad_count,
              appraise_default_count
          from dws_user_action_daycount
          where dt=date_add('2020-06-15',-30)
      )30d_ago
      on old.user_id=30d_ago.user_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133
      134
      135
      136
      137
      138
      139
      140
      141
      142
      143
      144
      145
      146
      147
      148
      149
      150
      151
      152
      153
      154
      155
      156
      157
      158
      159
      160
      161
      162
      163
      164
      165
      166
      167
      168
      169
      170
      171
      172
      173
      174
      175
      176
      177
      178
      179
      180
      181
      182
      183
      184
      185
      186
      187
      188
      189
      190
      191
      192
      193
      194
      195
      196
      197
      198
      199
      200
      201
      202
      203
      204
      205
      206
      207
      208
      209
      210
      211
      212
      213
      214
      215
      216
      217
      218
      219
      220
      221
      222
      223
      224
      225
      226
      227
      228
      229
      230
      231
      232
      233
      234
      235
      236
      237
      238
      239
      240
      241
      242
      243
      244
      245
      246
      247
      248
      249
      250
      251
      252
      253
      254
      255
      256
      257
      258
      259
      260
      261
      262
      263
      264
      265
      266
      267
      268
      269
      270
      271
      272
      273
      274
      275
      276
      277
      278
      279
      280
      281
      282
      283
      284
      285
      286
      287
      288
      289
      290
      291
      292
      293
      294
      295
      296
      297
      298
      299
      300
      301
      302
      303
      304
      305
      306
      307
      308
      309
      310
      311
      312
      313
      314
      315
      316
      317
      318
      319
      320
      321
      322
      323
      324
      325
      326
      327
      328
      329

# 商品主题

  1. 建表语句

    DROP TABLE IF EXISTS dwt_sku_topic;
    CREATE EXTERNAL TABLE dwt_sku_topic
    (
        `sku_id` STRING COMMENT 'sku_id',
        `order_last_1d_count` BIGINT COMMENT '最近1日被下单次数',
        `order_last_1d_num` BIGINT COMMENT '最近1日被下单件数',
        `order_activity_last_1d_count` BIGINT COMMENT '最近1日参与活动被下单次数',
        `order_coupon_last_1d_count` BIGINT COMMENT '最近1日使用优惠券被下单次数',
        `order_activity_reduce_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日优惠金额(活动)',
        `order_coupon_reduce_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日优惠金额(优惠券)',
        `order_last_1d_original_amount` DECIMAL(16,2) COMMENT '最近1日被下单原始金额',
        `order_last_1d_final_amount` DECIMAL(16,2) COMMENT '最近1日被下单最终金额',
        `order_last_7d_count` BIGINT COMMENT '最近7日被下单次数',
        `order_last_7d_num` BIGINT COMMENT '最近7日被下单件数',
        `order_activity_last_7d_count` BIGINT COMMENT '最近7日参与活动被下单次数',
        `order_coupon_last_7d_count` BIGINT COMMENT '最近7日使用优惠券被下单次数',
        `order_activity_reduce_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日优惠金额(活动)',
        `order_coupon_reduce_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日优惠金额(优惠券)',
        `order_last_7d_original_amount` DECIMAL(16,2) COMMENT '最近7日被下单原始金额',
        `order_last_7d_final_amount` DECIMAL(16,2) COMMENT '最近7日被下单最终金额',
        `order_last_30d_count` BIGINT COMMENT '最近30日被下单次数',
        `order_last_30d_num` BIGINT COMMENT '最近30日被下单件数',
        `order_activity_last_30d_count` BIGINT COMMENT '最近30日参与活动被下单次数',
        `order_coupon_last_30d_count` BIGINT COMMENT '最近30日使用优惠券被下单次数',
        `order_activity_reduce_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日优惠金额(活动)',
        `order_coupon_reduce_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日优惠金额(优惠券)',
        `order_last_30d_original_amount` DECIMAL(16,2) COMMENT '最近30日被下单原始金额',
        `order_last_30d_final_amount` DECIMAL(16,2) COMMENT '最近30日被下单最终金额',
        `order_count` BIGINT COMMENT '累积被下单次数',
        `order_num` BIGINT COMMENT '累积被下单件数',
        `order_activity_count` BIGINT COMMENT '累积参与活动被下单次数',
        `order_coupon_count` BIGINT COMMENT '累积使用优惠券被下单次数',
        `order_activity_reduce_amount` DECIMAL(16,2) COMMENT '累积优惠金额(活动)',
        `order_coupon_reduce_amount` DECIMAL(16,2) COMMENT '累积优惠金额(优惠券)',
        `order_original_amount` DECIMAL(16,2) COMMENT '累积被下单原始金额',
        `order_final_amount` DECIMAL(16,2) COMMENT '累积被下单最终金额',
        `payment_last_1d_count` BIGINT COMMENT '最近1日被支付次数',
        `payment_last_1d_num` BIGINT COMMENT '最近1日被支付件数',
        `payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日被支付金额',
        `payment_last_7d_count` BIGINT COMMENT '最近7日被支付次数',
        `payment_last_7d_num` BIGINT COMMENT '最近7日被支付件数',
        `payment_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日被支付金额',
        `payment_last_30d_count` BIGINT COMMENT '最近30日被支付次数',
        `payment_last_30d_num` BIGINT COMMENT '最近30日被支付件数',
        `payment_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日被支付金额',
        `payment_count` BIGINT COMMENT '累积被支付次数',
        `payment_num` BIGINT COMMENT '累积被支付件数',
        `payment_amount` DECIMAL(16,2) COMMENT '累积被支付金额',
        `refund_order_last_1d_count` BIGINT COMMENT '最近1日退单次数',
        `refund_order_last_1d_num` BIGINT COMMENT '最近1日退单件数',
        `refund_order_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日退单金额',
        `refund_order_last_7d_count` BIGINT COMMENT '最近7日退单次数',
        `refund_order_last_7d_num` BIGINT COMMENT '最近7日退单件数',
        `refund_order_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日退单金额',
        `refund_order_last_30d_count` BIGINT COMMENT '最近30日退单次数',
        `refund_order_last_30d_num` BIGINT COMMENT '最近30日退单件数',
        `refund_order_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日退单金额',
        `refund_order_count` BIGINT COMMENT '累积退单次数',
        `refund_order_num` BIGINT COMMENT '累积退单件数',
        `refund_order_amount` DECIMAL(16,2) COMMENT '累积退单金额',
        `refund_payment_last_1d_count` BIGINT COMMENT '最近1日退款次数',
        `refund_payment_last_1d_num` BIGINT COMMENT '最近1日退款件数',
        `refund_payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日退款金额',
        `refund_payment_last_7d_count` BIGINT COMMENT '最近7日退款次数',
        `refund_payment_last_7d_num` BIGINT COMMENT '最近7日退款件数',
        `refund_payment_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日退款金额',
        `refund_payment_last_30d_count` BIGINT COMMENT '最近30日退款次数',
        `refund_payment_last_30d_num` BIGINT COMMENT '最近30日退款件数',
        `refund_payment_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日退款金额',
        `refund_payment_count` BIGINT COMMENT '累积退款次数',
        `refund_payment_num` BIGINT COMMENT '累积退款件数',
        `refund_payment_amount` DECIMAL(16,2) COMMENT '累积退款金额',
        `cart_last_1d_count` BIGINT COMMENT '最近1日被加入购物车次数',
        `cart_last_7d_count` BIGINT COMMENT '最近7日被加入购物车次数',
        `cart_last_30d_count` BIGINT COMMENT '最近30日被加入购物车次数',
        `cart_count` BIGINT COMMENT '累积被加入购物车次数',
        `favor_last_1d_count` BIGINT COMMENT '最近1日被收藏次数',
        `favor_last_7d_count` BIGINT COMMENT '最近7日被收藏次数',
        `favor_last_30d_count` BIGINT COMMENT '最近30日被收藏次数',
        `favor_count` BIGINT COMMENT '累积被收藏次数',
        `appraise_last_1d_good_count` BIGINT COMMENT '最近1日好评数',
        `appraise_last_1d_mid_count` BIGINT COMMENT '最近1日中评数',
        `appraise_last_1d_bad_count` BIGINT COMMENT '最近1日差评数',
        `appraise_last_1d_default_count` BIGINT COMMENT '最近1日默认评价数',
        `appraise_last_7d_good_count` BIGINT COMMENT '最近7日好评数',
        `appraise_last_7d_mid_count` BIGINT COMMENT '最近7日中评数',
        `appraise_last_7d_bad_count` BIGINT COMMENT '最近7日差评数',
        `appraise_last_7d_default_count` BIGINT COMMENT '最近7日默认评价数',
        `appraise_last_30d_good_count` BIGINT COMMENT '最近30日好评数',
        `appraise_last_30d_mid_count` BIGINT COMMENT '最近30日中评数',
        `appraise_last_30d_bad_count` BIGINT COMMENT '最近30日差评数',
        `appraise_last_30d_default_count` BIGINT COMMENT '最近30日默认评价数',
        `appraise_good_count` BIGINT COMMENT '累积好评数',
        `appraise_mid_count` BIGINT COMMENT '累积中评数',
        `appraise_bad_count` BIGINT COMMENT '累积差评数',
        `appraise_default_count` BIGINT COMMENT '累积默认评价数'
     )COMMENT '商品主题宽表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwt/dwt_sku_topic/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
  2. 数据装载

    1. 首日装载

      insert overwrite table dwt_sku_topic partition(dt='2020-06-14')
      select
          id,
          nvl(order_last_1d_count,0),
          nvl(order_last_1d_num,0),
          nvl(order_activity_last_1d_count,0),
          nvl(order_coupon_last_1d_count,0),
          nvl(order_activity_reduce_last_1d_amount,0),
          nvl(order_coupon_reduce_last_1d_amount,0),
          nvl(order_last_1d_original_amount,0),
          nvl(order_last_1d_final_amount,0),
          nvl(order_last_7d_count,0),
          nvl(order_last_7d_num,0),
          nvl(order_activity_last_7d_count,0),
          nvl(order_coupon_last_7d_count,0),
          nvl(order_activity_reduce_last_7d_amount,0),
          nvl(order_coupon_reduce_last_7d_amount,0),
          nvl(order_last_7d_original_amount,0),
          nvl(order_last_7d_final_amount,0),
          nvl(order_last_30d_count,0),
          nvl(order_last_30d_num,0),
          nvl(order_activity_last_30d_count,0),
          nvl(order_coupon_last_30d_count,0),
          nvl(order_activity_reduce_last_30d_amount,0),
          nvl(order_coupon_reduce_last_30d_amount,0),
          nvl(order_last_30d_original_amount,0),
          nvl(order_last_30d_final_amount,0),
          nvl(order_count,0),
          nvl(order_num,0),
          nvl(order_activity_count,0),
          nvl(order_coupon_count,0),
          nvl(order_activity_reduce_amount,0),
          nvl(order_coupon_reduce_amount,0),
          nvl(order_original_amount,0),
          nvl(order_final_amount,0),
          nvl(payment_last_1d_count,0),
          nvl(payment_last_1d_num,0),
          nvl(payment_last_1d_amount,0),
          nvl(payment_last_7d_count,0),
          nvl(payment_last_7d_num,0),
          nvl(payment_last_7d_amount,0),
          nvl(payment_last_30d_count,0),
          nvl(payment_last_30d_num,0),
          nvl(payment_last_30d_amount,0),
          nvl(payment_count,0),
          nvl(payment_num,0),
          nvl(payment_amount,0),
          nvl(refund_order_last_1d_count,0),
          nvl(refund_order_last_1d_num,0),
          nvl(refund_order_last_1d_amount,0),
          nvl(refund_order_last_7d_count,0),
          nvl(refund_order_last_7d_num,0),
          nvl(refund_order_last_7d_amount,0),
          nvl(refund_order_last_30d_count,0),
          nvl(refund_order_last_30d_num,0),
          nvl(refund_order_last_30d_amount,0),
          nvl(refund_order_count,0),
          nvl(refund_order_num,0),
          nvl(refund_order_amount,0),
          nvl(refund_payment_last_1d_count,0),
          nvl(refund_payment_last_1d_num,0),
          nvl(refund_payment_last_1d_amount,0),
          nvl(refund_payment_last_7d_count,0),
          nvl(refund_payment_last_7d_num,0),
          nvl(refund_payment_last_7d_amount,0),
          nvl(refund_payment_last_30d_count,0),
          nvl(refund_payment_last_30d_num,0),
          nvl(refund_payment_last_30d_amount,0),
          nvl(refund_payment_count,0),
          nvl(refund_payment_num,0),
          nvl(refund_payment_amount,0),
          nvl(cart_last_1d_count,0),
          nvl(cart_last_7d_count,0),
          nvl(cart_last_30d_count,0),
          nvl(cart_count,0),
          nvl(favor_last_1d_count,0),
          nvl(favor_last_7d_count,0),
          nvl(favor_last_30d_count,0),
          nvl(favor_count,0),
          nvl(appraise_last_1d_good_count,0),
          nvl(appraise_last_1d_mid_count,0),
          nvl(appraise_last_1d_bad_count,0),
          nvl(appraise_last_1d_default_count,0),
          nvl(appraise_last_7d_good_count,0),
          nvl(appraise_last_7d_mid_count,0),
          nvl(appraise_last_7d_bad_count,0),
          nvl(appraise_last_7d_default_count,0),
          nvl(appraise_last_30d_good_count,0),
          nvl(appraise_last_30d_mid_count,0),
          nvl(appraise_last_30d_bad_count,0),
          nvl(appraise_last_30d_default_count,0),
          nvl(appraise_good_count,0),
          nvl(appraise_mid_count,0),
          nvl(appraise_bad_count,0),
          nvl(appraise_default_count,0)
      from
      (
          select
              id
          from dim_sku_info
          where dt='2020-06-14'
      )t1
      left join
      (
          select
              sku_id,
              sum(if(dt='2020-06-14',order_count,0)) order_last_1d_count,
              sum(if(dt='2020-06-14',order_num,0)) order_last_1d_num,
              sum(if(dt='2020-06-14',order_activity_count,0)) order_activity_last_1d_count,
              sum(if(dt='2020-06-14',order_coupon_count,0)) order_coupon_last_1d_count,
              sum(if(dt='2020-06-14',order_activity_reduce_amount,0)) order_activity_reduce_last_1d_amount,
              sum(if(dt='2020-06-14',order_coupon_reduce_amount,0)) order_coupon_reduce_last_1d_amount,
              sum(if(dt='2020-06-14',order_original_amount,0)) order_last_1d_original_amount,
              sum(if(dt='2020-06-14',order_final_amount,0)) order_last_1d_final_amount,
              sum(if(dt>=date_add('2020-06-14',-6),order_count,0)) order_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),order_num,0)) order_last_7d_num,
              sum(if(dt>=date_add('2020-06-14',-6),order_activity_count,0)) order_activity_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),order_coupon_count,0)) order_coupon_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),order_activity_reduce_amount,0)) order_activity_reduce_last_7d_amount,
              sum(if(dt>=date_add('2020-06-14',-6),order_coupon_reduce_amount,0)) order_coupon_reduce_last_7d_amount,
              sum(if(dt>=date_add('2020-06-14',-6),order_original_amount,0)) order_last_7d_original_amount,
              sum(if(dt>=date_add('2020-06-14',-6),order_final_amount,0)) order_last_7d_final_amount,
              sum(if(dt>=date_add('2020-06-14',-29),order_count,0)) order_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),order_num,0)) order_last_30d_num,
              sum(if(dt>=date_add('2020-06-14',-29),order_activity_count,0)) order_activity_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),order_coupon_count,0)) order_coupon_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),order_activity_reduce_amount,0)) order_activity_reduce_last_30d_amount,
              sum(if(dt>=date_add('2020-06-14',-29),order_coupon_reduce_amount,0)) order_coupon_reduce_last_30d_amount,
              sum(if(dt>=date_add('2020-06-14',-29),order_original_amount,0)) order_last_30d_original_amount,
              sum(if(dt>=date_add('2020-06-14',-29),order_final_amount,0)) order_last_30d_final_amount,
              sum(order_count) order_count,
              sum(order_num) order_num,
              sum(order_activity_count) order_activity_count,
              sum(order_coupon_count) order_coupon_count,
              sum(order_activity_reduce_amount) order_activity_reduce_amount,
              sum(order_coupon_reduce_amount) order_coupon_reduce_amount,
              sum(order_original_amount) order_original_amount,
              sum(order_final_amount) order_final_amount,
              sum(if(dt='2020-06-14',payment_count,0)) payment_last_1d_count,
              sum(if(dt='2020-06-14',payment_num,0)) payment_last_1d_num,
              sum(if(dt='2020-06-14',payment_amount,0)) payment_last_1d_amount,
              sum(if(dt>=date_add('2020-06-14',-6),payment_count,0)) payment_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),payment_num,0)) payment_last_7d_num,
              sum(if(dt>=date_add('2020-06-14',-6),payment_amount,0)) payment_last_7d_amount,
              sum(if(dt>=date_add('2020-06-14',-29),payment_count,0)) payment_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),payment_num,0)) payment_last_30d_num,
              sum(if(dt>=date_add('2020-06-14',-29),payment_amount,0)) payment_last_30d_amount,
              sum(payment_count) payment_count,
              sum(payment_num) payment_num,
              sum(payment_amount) payment_amount,
              sum(if(dt='2020-06-14',refund_order_count,0)) refund_order_last_1d_count,
              sum(if(dt='2020-06-14',refund_order_num,0)) refund_order_last_1d_num,
              sum(if(dt='2020-06-14',refund_order_amount,0)) refund_order_last_1d_amount,
              sum(if(dt>=date_add('2020-06-14',-6),refund_order_count,0)) refund_order_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),refund_order_num,0)) refund_order_last_7d_num,
              sum(if(dt>=date_add('2020-06-14',-6),refund_order_amount,0)) refund_order_last_7d_amount,
              sum(if(dt>=date_add('2020-06-14',-29),refund_order_count,0)) refund_order_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),refund_order_num,0)) refund_order_last_30d_num,
              sum(if(dt>=date_add('2020-06-14',-29),refund_order_amount,0)) refund_order_last_30d_amount,
              sum(refund_order_count) refund_order_count,
              sum(refund_order_num) refund_order_num,
              sum(refund_order_amount) refund_order_amount,
              sum(if(dt='2020-06-14',refund_payment_count,0)) refund_payment_last_1d_count,
              sum(if(dt='2020-06-14',refund_payment_num,0)) refund_payment_last_1d_num,
              sum(if(dt='2020-06-14',refund_payment_amount,0)) refund_payment_last_1d_amount,
              sum(if(dt>=date_add('2020-06-14',-6),refund_payment_count,0)) refund_payment_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),refund_payment_num,0)) refund_payment_last_7d_num,
              sum(if(dt>=date_add('2020-06-14',-6),refund_payment_amount,0)) refund_payment_last_7d_amount,
              sum(if(dt>=date_add('2020-06-14',-29),refund_payment_count,0)) refund_payment_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),refund_payment_num,0)) refund_payment_last_30d_num,
              sum(if(dt>=date_add('2020-06-14',-29),refund_payment_amount,0)) refund_payment_last_30d_amount,
              sum(refund_payment_count) refund_payment_count,
              sum(refund_payment_num) refund_payment_num,
              sum(refund_payment_amount) refund_payment_amount,
              sum(if(dt='2020-06-14',cart_count,0)) cart_last_1d_count,
              sum(if(dt>=date_add('2020-06-14',-6),cart_count,0)) cart_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-29),cart_count,0)) cart_last_30d_count,
              sum(cart_count) cart_count,
              sum(if(dt='2020-06-14',favor_count,0)) favor_last_1d_count,
              sum(if(dt>=date_add('2020-06-14',-6),favor_count,0)) favor_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-29),favor_count,0)) favor_last_30d_count,
              sum(favor_count) favor_count,
              sum(if(dt='2020-06-14',appraise_good_count,0)) appraise_last_1d_good_count,
              sum(if(dt='2020-06-14',appraise_mid_count,0)) appraise_last_1d_mid_count,
              sum(if(dt='2020-06-14',appraise_bad_count,0)) appraise_last_1d_bad_count,
              sum(if(dt='2020-06-14',appraise_default_count,0)) appraise_last_1d_default_count,
              sum(if(dt>=date_add('2020-06-14',-6),appraise_good_count,0)) appraise_last_7d_good_count,
              sum(if(dt>=date_add('2020-06-14',-6),appraise_mid_count,0)) appraise_last_7d_mid_count,
              sum(if(dt>=date_add('2020-06-14',-6),appraise_bad_count,0)) appraise_last_7d_bad_count,
              sum(if(dt>=date_add('2020-06-14',-6),appraise_default_count,0)) appraise_last_7d_default_count,
              sum(if(dt>=date_add('2020-06-14',-29),appraise_good_count,0)) appraise_last_30d_good_count,
              sum(if(dt>=date_add('2020-06-14',-29),appraise_mid_count,0)) appraise_last_30d_mid_count,
              sum(if(dt>=date_add('2020-06-14',-29),appraise_bad_count,0)) appraise_last_30d_bad_count,
              sum(if(dt>=date_add('2020-06-14',-29),appraise_default_count,0)) appraise_last_30d_default_count,
              sum(appraise_good_count) appraise_good_count,
              sum(appraise_mid_count) appraise_mid_count,
              sum(appraise_bad_count) appraise_bad_count,
              sum(appraise_default_count) appraise_default_count
          from dws_sku_action_daycount
          group by sku_id
      )t2
      on t1.id=t2.sku_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133
      134
      135
      136
      137
      138
      139
      140
      141
      142
      143
      144
      145
      146
      147
      148
      149
      150
      151
      152
      153
      154
      155
      156
      157
      158
      159
      160
      161
      162
      163
      164
      165
      166
      167
      168
      169
      170
      171
      172
      173
      174
      175
      176
      177
      178
      179
      180
      181
      182
      183
      184
      185
      186
      187
      188
      189
      190
      191
      192
      193
      194
      195
      196
      197
      198
      199
      200
      201
      202
    2. 每日装载

      insert overwrite table dwt_sku_topic partition(dt='2020-06-15')
      select
          nvl(1d_ago.sku_id,old.sku_id),
          nvl(1d_ago.order_count,0),
          nvl(1d_ago.order_num,0),
          nvl(1d_ago.order_activity_count,0),
          nvl(1d_ago.order_coupon_count,0),
          nvl(1d_ago.order_activity_reduce_amount,0.0),
          nvl(1d_ago.order_coupon_reduce_amount,0.0),
          nvl(1d_ago.order_original_amount,0.0),
          nvl(1d_ago.order_final_amount,0.0),
          nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0),
          nvl(old.order_last_7d_num,0)+nvl(1d_ago.order_num,0)- nvl(7d_ago.order_num,0),
          nvl(old.order_activity_last_7d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(7d_ago.order_activity_count,0),
          nvl(old.order_coupon_last_7d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(7d_ago.order_coupon_count,0),
          nvl(old.order_activity_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(7d_ago.order_activity_reduce_amount,0.0),
          nvl(old.order_coupon_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(7d_ago.order_coupon_reduce_amount,0.0),
          nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0),
          nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0),
          nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0),
          nvl(old.order_last_30d_num,0)+nvl(1d_ago.order_num,0)- nvl(30d_ago.order_num,0),
          nvl(old.order_activity_last_30d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(30d_ago.order_activity_count,0),
          nvl(old.order_coupon_last_30d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(30d_ago.order_coupon_count,0),
          nvl(old.order_activity_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(30d_ago.order_activity_reduce_amount,0.0),
          nvl(old.order_coupon_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(30d_ago.order_coupon_reduce_amount,0.0),
          nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0),
          nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0),
          nvl(old.order_count,0)+nvl(1d_ago.order_count,0),
          nvl(old.order_num,0)+nvl(1d_ago.order_num,0),
          nvl(old.order_activity_count,0)+nvl(1d_ago.order_activity_count,0),
          nvl(old.order_coupon_count,0)+nvl(1d_ago.order_coupon_count,0),
          nvl(old.order_activity_reduce_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0),
          nvl(old.order_coupon_reduce_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0),
          nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0),
          nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0),
          nvl(1d_ago.payment_count,0),
          nvl(1d_ago.payment_num,0),
          nvl(1d_ago.payment_amount,0.0),
          nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)- nvl(7d_ago.payment_count,0),
          nvl(old.payment_last_7d_num,0)+nvl(1d_ago.payment_num,0)- nvl(7d_ago.payment_num,0),
          nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(7d_ago.payment_amount,0.0),
          nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)- nvl(30d_ago.payment_count,0),
          nvl(old.payment_last_30d_num,0)+nvl(1d_ago.payment_num,0)- nvl(30d_ago.payment_num,0),
          nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0),
          nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0),
          nvl(old.payment_num,0)+nvl(1d_ago.payment_num,0),
          nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0),
          nvl(old.refund_order_last_1d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(1d_ago.refund_order_count,0),
          nvl(old.refund_order_last_1d_num,0)+nvl(1d_ago.refund_order_num,0)- nvl(1d_ago.refund_order_num,0),
          nvl(old.refund_order_last_1d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(1d_ago.refund_order_amount,0.0),
          nvl(old.refund_order_last_7d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(7d_ago.refund_order_count,0),
          nvl(old.refund_order_last_7d_num,0)+nvl(1d_ago.refund_order_num,0)- nvl(7d_ago.refund_order_num,0),
          nvl(old.refund_order_last_7d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(7d_ago.refund_order_amount,0.0),
          nvl(old.refund_order_last_30d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(30d_ago.refund_order_count,0),
          nvl(old.refund_order_last_30d_num,0)+nvl(1d_ago.refund_order_num,0)- nvl(30d_ago.refund_order_num,0),
          nvl(old.refund_order_last_30d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(30d_ago.refund_order_amount,0.0),
          nvl(old.refund_order_count,0)+nvl(1d_ago.refund_order_count,0),
          nvl(old.refund_order_num,0)+nvl(1d_ago.refund_order_num,0),
          nvl(old.refund_order_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0),
          nvl(1d_ago.refund_payment_count,0),
          nvl(1d_ago.refund_payment_num,0),
          nvl(1d_ago.refund_payment_amount,0.0),
          nvl(old.refund_payment_last_7d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(7d_ago.refund_payment_count,0),
          nvl(old.refund_payment_last_7d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(7d_ago.refund_payment_num,0),
          nvl(old.refund_payment_last_7d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(7d_ago.refund_payment_amount,0.0),
          nvl(old.refund_payment_last_30d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(30d_ago.refund_payment_count,0),
          nvl(old.refund_payment_last_30d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(30d_ago.refund_payment_num,0),
          nvl(old.refund_payment_last_30d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(30d_ago.refund_payment_amount,0.0),
          nvl(old.refund_payment_count,0)+nvl(1d_ago.refund_payment_count,0),
          nvl(old.refund_payment_num,0)+nvl(1d_ago.refund_payment_num,0),
          nvl(old.refund_payment_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0),
          nvl(1d_ago.cart_count,0),
          nvl(old.cart_last_7d_count,0)+nvl(1d_ago.cart_count,0)- nvl(7d_ago.cart_count,0),
          nvl(old.cart_last_30d_count,0)+nvl(1d_ago.cart_count,0)- nvl(30d_ago.cart_count,0),
          nvl(old.cart_count,0)+nvl(1d_ago.cart_count,0),
          nvl(1d_ago.favor_count,0),
          nvl(old.favor_last_7d_count,0)+nvl(1d_ago.favor_count,0)- nvl(7d_ago.favor_count,0),
          nvl(old.favor_last_30d_count,0)+nvl(1d_ago.favor_count,0)- nvl(30d_ago.favor_count,0),
          nvl(old.favor_count,0)+nvl(1d_ago.favor_count,0),
          nvl(1d_ago.appraise_good_count,0),
          nvl(1d_ago.appraise_mid_count,0),
          nvl(1d_ago.appraise_bad_count,0),
          nvl(1d_ago.appraise_default_count,0),
          nvl(old.appraise_last_7d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(7d_ago.appraise_good_count,0),
          nvl(old.appraise_last_7d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)- nvl(7d_ago.appraise_mid_count,0),
          nvl(old.appraise_last_7d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)- nvl(7d_ago.appraise_bad_count,0),
          nvl(old.appraise_last_7d_default_count,0)+nvl(1d_ago.appraise_default_count,0)- nvl(7d_ago.appraise_default_count,0),
          nvl(old.appraise_last_30d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(30d_ago.appraise_good_count,0),
          nvl(old.appraise_last_30d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)- nvl(30d_ago.appraise_mid_count,0),
          nvl(old.appraise_last_30d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)- nvl(30d_ago.appraise_bad_count,0),
          nvl(old.appraise_last_30d_default_count,0)+nvl(1d_ago.appraise_default_count,0)- nvl(30d_ago.appraise_default_count,0),
          nvl(old.appraise_good_count,0)+nvl(1d_ago.appraise_good_count,0),
          nvl(old.appraise_mid_count,0)+nvl(1d_ago.appraise_mid_count,0),
          nvl(old.appraise_bad_count,0)+nvl(1d_ago.appraise_bad_count,0),
          nvl(old.appraise_default_count,0)+nvl(1d_ago.appraise_default_count,0)
      from
      (
          select
              sku_id,
              order_last_1d_count,
              order_last_1d_num,
              order_activity_last_1d_count,
              order_coupon_last_1d_count,
              order_activity_reduce_last_1d_amount,
              order_coupon_reduce_last_1d_amount,
              order_last_1d_original_amount,
              order_last_1d_final_amount,
              order_last_7d_count,
              order_last_7d_num,
              order_activity_last_7d_count,
              order_coupon_last_7d_count,
              order_activity_reduce_last_7d_amount,
              order_coupon_reduce_last_7d_amount,
              order_last_7d_original_amount,
              order_last_7d_final_amount,
              order_last_30d_count,
              order_last_30d_num,
              order_activity_last_30d_count,
              order_coupon_last_30d_count,
              order_activity_reduce_last_30d_amount,
              order_coupon_reduce_last_30d_amount,
              order_last_30d_original_amount,
              order_last_30d_final_amount,
              order_count,
              order_num,
              order_activity_count,
              order_coupon_count,
              order_activity_reduce_amount,
              order_coupon_reduce_amount,
              order_original_amount,
              order_final_amount,
              payment_last_1d_count,
              payment_last_1d_num,
              payment_last_1d_amount,
              payment_last_7d_count,
              payment_last_7d_num,
              payment_last_7d_amount,
              payment_last_30d_count,
              payment_last_30d_num,
              payment_last_30d_amount,
              payment_count,
              payment_num,
              payment_amount,
              refund_order_last_1d_count,
              refund_order_last_1d_num,
              refund_order_last_1d_amount,
              refund_order_last_7d_count,
              refund_order_last_7d_num,
              refund_order_last_7d_amount,
              refund_order_last_30d_count,
              refund_order_last_30d_num,
              refund_order_last_30d_amount,
              refund_order_count,
              refund_order_num,
              refund_order_amount,
              refund_payment_last_1d_count,
              refund_payment_last_1d_num,
              refund_payment_last_1d_amount,
              refund_payment_last_7d_count,
              refund_payment_last_7d_num,
              refund_payment_last_7d_amount,
              refund_payment_last_30d_count,
              refund_payment_last_30d_num,
              refund_payment_last_30d_amount,
              refund_payment_count,
              refund_payment_num,
              refund_payment_amount,
              cart_last_1d_count,
              cart_last_7d_count,
              cart_last_30d_count,
              cart_count,
              favor_last_1d_count,
              favor_last_7d_count,
              favor_last_30d_count,
              favor_count,
              appraise_last_1d_good_count,
              appraise_last_1d_mid_count,
              appraise_last_1d_bad_count,
              appraise_last_1d_default_count,
              appraise_last_7d_good_count,
              appraise_last_7d_mid_count,
              appraise_last_7d_bad_count,
              appraise_last_7d_default_count,
              appraise_last_30d_good_count,
              appraise_last_30d_mid_count,
              appraise_last_30d_bad_count,
              appraise_last_30d_default_count,
              appraise_good_count,
              appraise_mid_count,
              appraise_bad_count,
              appraise_default_count
          from dwt_sku_topic
          where dt=date_add('2020-06-15',-1)
      )old
      full outer join
      (
          select
              sku_id,
              order_count,
              order_num,
              order_activity_count,
              order_coupon_count,
              order_activity_reduce_amount,
              order_coupon_reduce_amount,
              order_original_amount,
              order_final_amount,
              payment_count,
              payment_num,
              payment_amount,
              refund_order_count,
              refund_order_num,
              refund_order_amount,
              refund_payment_count,
              refund_payment_num,
              refund_payment_amount,
              cart_count,
              favor_count,
              appraise_good_count,
              appraise_mid_count,
              appraise_bad_count,
              appraise_default_count
          from dws_sku_action_daycount
          where dt='2020-06-15'
      )1d_ago
      on old.sku_id=1d_ago.sku_id
      left join
      (
          select
              sku_id,
              order_count,
              order_num,
              order_activity_count,
              order_coupon_count,
              order_activity_reduce_amount,
              order_coupon_reduce_amount,
              order_original_amount,
              order_final_amount,
              payment_count,
              payment_num,
              payment_amount,
              refund_order_count,
              refund_order_num,
              refund_order_amount,
              refund_payment_count,
              refund_payment_num,
              refund_payment_amount,
              cart_count,
              favor_count,
              appraise_good_count,
              appraise_mid_count,
              appraise_bad_count,
              appraise_default_count
          from dws_sku_action_daycount
          where dt=date_add('2020-06-15',-7)
      )7d_ago
      on old.sku_id=7d_ago.sku_id
      left join
      (
          select
              sku_id,
              order_count,
              order_num,
              order_activity_count,
              order_coupon_count,
              order_activity_reduce_amount,
              order_coupon_reduce_amount,
              order_original_amount,
              order_final_amount,
              payment_count,
              payment_num,
              payment_amount,
              refund_order_count,
              refund_order_num,
              refund_order_amount,
              refund_payment_count,
              refund_payment_num,
              refund_payment_amount,
              cart_count,
              favor_count,
              appraise_good_count,
              appraise_mid_count,
              appraise_bad_count,
              appraise_default_count
          from dws_sku_action_daycount
          where dt=date_add('2020-06-15',-30)
      )30d_ago
      on old.sku_id=30d_ago.sku_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133
      134
      135
      136
      137
      138
      139
      140
      141
      142
      143
      144
      145
      146
      147
      148
      149
      150
      151
      152
      153
      154
      155
      156
      157
      158
      159
      160
      161
      162
      163
      164
      165
      166
      167
      168
      169
      170
      171
      172
      173
      174
      175
      176
      177
      178
      179
      180
      181
      182
      183
      184
      185
      186
      187
      188
      189
      190
      191
      192
      193
      194
      195
      196
      197
      198
      199
      200
      201
      202
      203
      204
      205
      206
      207
      208
      209
      210
      211
      212
      213
      214
      215
      216
      217
      218
      219
      220
      221
      222
      223
      224
      225
      226
      227
      228
      229
      230
      231
      232
      233
      234
      235
      236
      237
      238
      239
      240
      241
      242
      243
      244
      245
      246
      247
      248
      249
      250
      251
      252
      253
      254
      255
      256
      257
      258
      259
      260
      261
      262
      263
      264
      265
      266
      267
      268
      269
      270
      271
      272
      273
      274
      275
      276
      277
      278
      279
      280
      281
      282
      283
      284
      285
      286
      287

# 优惠券主题

  1. 建表语句

    DROP TABLE IF EXISTS dwt_coupon_topic;
    CREATE EXTERNAL TABLE dwt_coupon_topic(
        `coupon_id` STRING COMMENT '优惠券ID',
        `get_last_1d_count` BIGINT COMMENT '最近1日领取次数',
        `get_last_7d_count` BIGINT COMMENT '最近7日领取次数',
        `get_last_30d_count` BIGINT COMMENT '最近30日领取次数',
        `get_count` BIGINT COMMENT '累积领取次数',
        `order_last_1d_count` BIGINT COMMENT '最近1日使用某券下单次数',
        `order_last_1d_reduce_amount` DECIMAL(16,2) COMMENT '最近1日使用某券下单优惠金额',
        `order_last_1d_original_amount` DECIMAL(16,2) COMMENT '最近1日使用某券下单原始金额',
        `order_last_1d_final_amount` DECIMAL(16,2) COMMENT '最近1日使用某券下单最终金额',
        `order_last_7d_count` BIGINT COMMENT '最近7日使用某券下单次数',
        `order_last_7d_reduce_amount` DECIMAL(16,2) COMMENT '最近7日使用某券下单优惠金额',
        `order_last_7d_original_amount` DECIMAL(16,2) COMMENT '最近7日使用某券下单原始金额',
        `order_last_7d_final_amount` DECIMAL(16,2) COMMENT '最近7日使用某券下单最终金额',
        `order_last_30d_count` BIGINT COMMENT '最近30日使用某券下单次数',
        `order_last_30d_reduce_amount` DECIMAL(16,2) COMMENT '最近30日使用某券下单优惠金额',
        `order_last_30d_original_amount` DECIMAL(16,2) COMMENT '最近30日使用某券下单原始金额',
        `order_last_30d_final_amount` DECIMAL(16,2) COMMENT '最近30日使用某券下单最终金额',
        `order_count` BIGINT COMMENT '累积使用(下单)次数',
        `order_reduce_amount` DECIMAL(16,2) COMMENT '使用某券累积下单优惠金额',
        `order_original_amount` DECIMAL(16,2) COMMENT '使用某券累积下单原始金额',
        `order_final_amount` DECIMAL(16,2) COMMENT '使用某券累积下单最终金额',
        `payment_last_1d_count` BIGINT COMMENT '最近1日使用某券支付次数',
        `payment_last_1d_reduce_amount` DECIMAL(16,2) COMMENT '最近1日使用某券优惠金额',
        `payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日使用某券支付金额',
        `payment_last_7d_count` BIGINT COMMENT '最近7日使用某券支付次数',
        `payment_last_7d_reduce_amount` DECIMAL(16,2) COMMENT '最近7日使用某券优惠金额',
        `payment_last_7d_amount` DECIMAL(16,2) COMMENT '最近7日使用某券支付金额',
        `payment_last_30d_count` BIGINT COMMENT '最近30日使用某券支付次数',
        `payment_last_30d_reduce_amount` DECIMAL(16,2) COMMENT '最近30日使用某券优惠金额',
        `payment_last_30d_amount` DECIMAL(16,2) COMMENT '最近30日使用某券支付金额',
        `payment_count` BIGINT COMMENT '累积使用(支付)次数',
        `payment_reduce_amount` DECIMAL(16,2) COMMENT '使用某券累积优惠金额',
        `payment_amount` DECIMAL(16,2) COMMENT '使用某券累积支付金额',
        `expire_last_1d_count` BIGINT COMMENT '最近1日过期次数',
        `expire_last_7d_count` BIGINT COMMENT '最近7日过期次数',
        `expire_last_30d_count` BIGINT COMMENT '最近30日过期次数',
        `expire_count` BIGINT COMMENT '累积过期次数'
    )comment '优惠券主题表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwt/dwt_coupon_topic/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
  2. 数据装载

    1. 首日装载

      insert overwrite table dwt_coupon_topic partition(dt='2020-06-14')
      select
          id,
          nvl(get_last_1d_count,0),
          nvl(get_last_7d_count,0),
          nvl(get_last_30d_count,0),
          nvl(get_count,0),
          nvl(order_last_1d_count,0),
          nvl(order_last_1d_reduce_amount,0),
          nvl(order_last_1d_original_amount,0),
          nvl(order_last_1d_final_amount,0),
          nvl(order_last_7d_count,0),
          nvl(order_last_7d_reduce_amount,0),
          nvl(order_last_7d_original_amount,0),
          nvl(order_last_7d_final_amount,0),
          nvl(order_last_30d_count,0),
          nvl(order_last_30d_reduce_amount,0),
          nvl(order_last_30d_original_amount,0),
          nvl(order_last_30d_final_amount,0),
          nvl(order_count,0),
          nvl(order_reduce_amount,0),
          nvl(order_original_amount,0),
          nvl(order_final_amount,0),
          nvl(payment_last_1d_count,0),
          nvl(payment_last_1d_reduce_amount,0),
          nvl(payment_last_1d_amount,0),
          nvl(payment_last_7d_count,0),
          nvl(payment_last_7d_reduce_amount,0),
          nvl(payment_last_7d_amount,0),
          nvl(payment_last_30d_count,0),
          nvl(payment_last_30d_reduce_amount,0),
          nvl(payment_last_30d_amount,0),
          nvl(payment_count,0),
          nvl(payment_reduce_amount,0),
          nvl(payment_amount,0),
          nvl(expire_last_1d_count,0),
          nvl(expire_last_7d_count,0),
          nvl(expire_last_30d_count,0),
          nvl(expire_count,0)
      from
      (
          select
              id
          from dim_coupon_info
          where dt='2020-06-14'
      )t1
      left join
      (
          select
              coupon_id coupon_id,
              sum(if(dt='2020-06-14',get_count,0)) get_last_1d_count,
              sum(if(dt>=date_add('2020-06-14',-6),get_count,0)) get_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-29),get_count,0)) get_last_30d_count,
              sum(get_count) get_count,
              sum(if(dt='2020-06-14',order_count,0)) order_last_1d_count,
              sum(if(dt='2020-06-14',order_reduce_amount,0)) order_last_1d_reduce_amount,
              sum(if(dt='2020-06-14',order_original_amount,0)) order_last_1d_original_amount,
              sum(if(dt='2020-06-14',order_final_amount,0)) order_last_1d_final_amount,
              sum(if(dt>=date_add('2020-06-14',-6),order_count,0)) order_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),order_reduce_amount,0)) order_last_7d_reduce_amount,
              sum(if(dt>=date_add('2020-06-14',-6),order_original_amount,0)) order_last_7d_original_amount,
              sum(if(dt>=date_add('2020-06-14',-6),order_final_amount,0)) order_last_7d_final_amount,
              sum(if(dt>=date_add('2020-06-14',-29),order_count,0)) order_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),order_reduce_amount,0)) order_last_30d_reduce_amount,
              sum(if(dt>=date_add('2020-06-14',-29),order_original_amount,0)) order_last_30d_original_amount,
              sum(if(dt>=date_add('2020-06-14',-29),order_final_amount,0)) order_last_30d_final_amount,
              sum(order_count) order_count,
              sum(order_reduce_amount) order_reduce_amount,
              sum(order_original_amount) order_original_amount,
              sum(order_final_amount) order_final_amount,
              sum(if(dt='2020-06-14',payment_count,0)) payment_last_1d_count,
              sum(if(dt='2020-06-14',payment_reduce_amount,0)) payment_last_1d_reduce_amount,
              sum(if(dt='2020-06-14',payment_amount,0)) payment_last_1d_amount,
              sum(if(dt>=date_add('2020-06-14',-6),payment_count,0)) payment_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),payment_reduce_amount,0)) payment_last_7d_reduce_amount,
              sum(if(dt>=date_add('2020-06-14',-6),payment_amount,0)) payment_last_7d_amount,
              sum(if(dt>=date_add('2020-06-14',-29),payment_count,0)) payment_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),payment_reduce_amount,0)) payment_last_30d_reduce_amount,
              sum(if(dt>=date_add('2020-06-14',-29),payment_amount,0)) payment_last_30d_amount,
              sum(payment_count) payment_count,
              sum(payment_reduce_amount) payment_reduce_amount,
              sum(payment_amount) payment_amount,
              sum(if(dt='2020-06-14',expire_count,0)) expire_last_1d_count,
              sum(if(dt>=date_add('2020-06-14',-6),expire_count,0)) expire_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-29),expire_count,0)) expire_last_30d_count,
              sum(expire_count) expire_count
          from dws_coupon_info_daycount
          group by coupon_id
      )t2
      on t1.id=t2.coupon_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
    2. 每日装载

      insert overwrite table dwt_coupon_topic partition(dt='2020-06-15')
      select
          nvl(1d_ago.coupon_id,old.coupon_id),
          nvl(1d_ago.get_count,0),
          nvl(old.get_last_7d_count,0)+nvl(1d_ago.get_count,0)- nvl(7d_ago.get_count,0),
          nvl(old.get_last_30d_count,0)+nvl(1d_ago.get_count,0)- nvl(30d_ago.get_count,0),
          nvl(old.get_count,0)+nvl(1d_ago.get_count,0),
          nvl(1d_ago.order_count,0),
          nvl(1d_ago.order_reduce_amount,0.0),
          nvl(1d_ago.order_original_amount,0.0),
          nvl(1d_ago.order_final_amount,0.0),
          nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0),
          nvl(old.order_last_7d_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0)- nvl(7d_ago.order_reduce_amount,0.0),
          nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0),
          nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0),
          nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0),
          nvl(old.order_last_30d_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0)- nvl(30d_ago.order_reduce_amount,0.0),
          nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0),
          nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0),
          nvl(old.order_count,0)+nvl(1d_ago.order_count,0),
          nvl(old.order_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0),
          nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0),
          nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0),
          nvl(old.payment_last_1d_count,0)+nvl(1d_ago.payment_count,0)- nvl(1d_ago.payment_count,0),
          nvl(old.payment_last_1d_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0)- nvl(1d_ago.payment_reduce_amount,0.0),
          nvl(old.payment_last_1d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(1d_ago.payment_amount,0.0),
          nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)- nvl(7d_ago.payment_count,0),
          nvl(old.payment_last_7d_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0)- nvl(7d_ago.payment_reduce_amount,0.0),
          nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(7d_ago.payment_amount,0.0),
          nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)- nvl(30d_ago.payment_count,0),
          nvl(old.payment_last_30d_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0)- nvl(30d_ago.payment_reduce_amount,0.0),
          nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0),
          nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0),
          nvl(old.payment_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0),
          nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0),
          nvl(1d_ago.expire_count,0),
          nvl(old.expire_last_7d_count,0)+nvl(1d_ago.expire_count,0)- nvl(7d_ago.expire_count,0),
          nvl(old.expire_last_30d_count,0)+nvl(1d_ago.expire_count,0)- nvl(30d_ago.expire_count,0),
          nvl(old.expire_count,0)+nvl(1d_ago.expire_count,0)
      from
      (
          select
              coupon_id,
              get_last_1d_count,
              get_last_7d_count,
              get_last_30d_count,
              get_count,
              order_last_1d_count,
              order_last_1d_reduce_amount,
              order_last_1d_original_amount,
              order_last_1d_final_amount,
              order_last_7d_count,
              order_last_7d_reduce_amount,
              order_last_7d_original_amount,
              order_last_7d_final_amount,
              order_last_30d_count,
              order_last_30d_reduce_amount,
              order_last_30d_original_amount,
              order_last_30d_final_amount,
              order_count,
              order_reduce_amount,
              order_original_amount,
              order_final_amount,
              payment_last_1d_count,
              payment_last_1d_reduce_amount,
              payment_last_1d_amount,
              payment_last_7d_count,
              payment_last_7d_reduce_amount,
              payment_last_7d_amount,
              payment_last_30d_count,
              payment_last_30d_reduce_amount,
              payment_last_30d_amount,
              payment_count,
              payment_reduce_amount,
              payment_amount,
              expire_last_1d_count,
              expire_last_7d_count,
              expire_last_30d_count,
              expire_count
          from dwt_coupon_topic
          where dt=date_add('2020-06-15',-1)
      )old
      full outer join
      (
          select
              coupon_id,
              get_count,
              order_count,
              order_reduce_amount,
              order_original_amount,
              order_final_amount,
              payment_count,
              payment_reduce_amount,
              payment_amount,
              expire_count
          from dws_coupon_info_daycount
          where dt='2020-06-15'
      )1d_ago
      on old.coupon_id=1d_ago.coupon_id
      left join
      (
          select
              coupon_id,
              get_count,
              order_count,
              order_reduce_amount,
              order_original_amount,
              order_final_amount,
              payment_count,
              payment_reduce_amount,
              payment_amount,
              expire_count
          from dws_coupon_info_daycount
          where dt=date_add('2020-06-15',-7)
      )7d_ago
      on old.coupon_id=7d_ago.coupon_id
      left join
      (
          select
              coupon_id,
              get_count,
              order_count,
              order_reduce_amount,
              order_original_amount,
              order_final_amount,
              payment_count,
              payment_reduce_amount,
              payment_amount,
              expire_count
          from dws_coupon_info_daycount
          where dt=date_add('2020-06-15',-30)
      )30d_ago
      on old.coupon_id=30d_ago.coupon_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133

# 活动主题

  1. 建表语句

    DROP TABLE IF EXISTS dwt_activity_topic;
    CREATE EXTERNAL TABLE dwt_activity_topic(
        `activity_rule_id` STRING COMMENT '活动规则ID',
        `activity_id` STRING  COMMENT '活动ID',
        `order_last_1d_count` BIGINT COMMENT '最近1日参与某活动某规则下单次数',
        `order_last_1d_reduce_amount` DECIMAL(16,2) COMMENT '最近1日参与某活动某规则下单优惠金额',
        `order_last_1d_original_amount` DECIMAL(16,2) COMMENT '最近1日参与某活动某规则下单原始金额',
        `order_last_1d_final_amount` DECIMAL(16,2) COMMENT '最近1日参与某活动某规则下单最终金额',
        `order_count` BIGINT COMMENT '参与某活动某规则累积下单次数',
        `order_reduce_amount` DECIMAL(16,2) COMMENT '参与某活动某规则累积下单优惠金额',
        `order_original_amount` DECIMAL(16,2) COMMENT '参与某活动某规则累积下单原始金额',
        `order_final_amount` DECIMAL(16,2) COMMENT '参与某活动某规则累积下单最终金额',
        `payment_last_1d_count` BIGINT COMMENT '最近1日参与某活动某规则支付次数',
        `payment_last_1d_reduce_amount` DECIMAL(16,2) COMMENT '最近1日参与某活动某规则支付优惠金额',
        `payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1日参与某活动某规则支付金额',
        `payment_count` BIGINT COMMENT '参与某活动某规则累积支付次数',
        `payment_reduce_amount` DECIMAL(16,2) COMMENT '参与某活动某规则累积支付优惠金额',
        `payment_amount` DECIMAL(16,2) COMMENT '参与某活动某规则累积支付金额'
    ) COMMENT '活动主题宽表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwt/dwt_activity_topic/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
  2. 数据装载

    1. 首日装载

      insert overwrite table dwt_activity_topic partition(dt='2020-06-14')
      select
          t1.activity_rule_id,
          t1.activity_id,
          nvl(order_last_1d_count,0),
          nvl(order_last_1d_reduce_amount,0),
          nvl(order_last_1d_original_amount,0),
          nvl(order_last_1d_final_amount,0),
          nvl(order_count,0),
          nvl(order_reduce_amount,0),
          nvl(order_original_amount,0),
          nvl(order_final_amount,0),
          nvl(payment_last_1d_count,0),
          nvl(payment_last_1d_reduce_amount,0),
          nvl(payment_last_1d_amount,0),
          nvl(payment_count,0),
          nvl(payment_reduce_amount,0),
          nvl(payment_amount,0)
      from
      (
          select
              activity_rule_id,
              activity_id
          from dim_activity_rule_info
          where dt='2020-06-14'
      )t1
      left join
      (
          select
              activity_rule_id,
              activity_id,
      
              sum(if(dt='2020-06-14',order_count,0)) order_last_1d_count,
              sum(if(dt='2020-06-14',order_reduce_amount,0)) order_last_1d_reduce_amount,
              sum(if(dt='2020-06-14',order_original_amount,0)) order_last_1d_original_amount,
              sum(if(dt='2020-06-14',order_final_amount,0)) order_last_1d_final_amount,
              sum(order_count) order_count,
              sum(order_reduce_amount) order_reduce_amount,
              sum(order_original_amount) order_original_amount,
              sum(order_final_amount) order_final_amount,
              sum(if(dt='2020-06-14',payment_count,0)) payment_last_1d_count,
              sum(if(dt='2020-06-14',payment_reduce_amount,0)) payment_last_1d_reduce_amount,
              sum(if(dt='2020-06-14',payment_amount,0)) payment_last_1d_amount,
              sum(payment_count) payment_count,
              sum(payment_reduce_amount) payment_reduce_amount,
              sum(payment_amount) payment_amount
          from dws_activity_info_daycount
          group by activity_rule_id,activity_id
      )t2
      on t1.activity_rule_id=t2.activity_rule_id
      and t1.activity_id=t2.activity_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
    2. 每日装载

      insert overwrite table dwt_activity_topic partition(dt='2020-
      insert overwrite table dwt_activity_topic partition(dt='2020-06-15')
      select
          nvl(1d_ago.activity_rule_id,old.activity_rule_id),
          nvl(1d_ago.activity_id,old.activity_id),
          nvl(1d_ago.order_count,0),
          nvl(1d_ago.order_reduce_amount,0.0),
          nvl(1d_ago.order_original_amount,0.0),
          nvl(1d_ago.order_final_amount,0.0),
          nvl(old.order_count,0)+nvl(1d_ago.order_count,0),
          nvl(old.order_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0),
          nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0),
          nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0),
          nvl(1d_ago.payment_count,0),
          nvl(1d_ago.payment_reduce_amount,0.0),
          nvl(1d_ago.payment_amount,0.0),
          nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0),
          nvl(old.payment_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0),
          nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0)
      from
      (
          select
              activity_rule_id,
              activity_id,
              order_count,
              order_reduce_amount,
              order_original_amount,
              order_final_amount,
              payment_count,
              payment_reduce_amount,
              payment_amount
          from dwt_activity_topic
          where dt=date_add('2020-06-15',-1)
      )old
      full outer join
      (
          select
              activity_rule_id,
              activity_id,
              order_count,
              order_reduce_amount,
              order_original_amount,
              order_final_amount,
              payment_count,
              payment_reduce_amount,
              payment_amount
          from dws_activity_info_daycount
          where dt='2020-06-15'
      )1d_ago
      on old.activity_rule_id=1d_ago.activity_rule_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50

# 地区主题

  1. 建表语句

    DROP TABLE IF EXISTS dwt_area_topic;
    CREATE EXTERNAL TABLE dwt_area_topic(
        `province_id` STRING COMMENT '编号',
        `visit_last_1d_count` BIGINT COMMENT '最近1日访客访问次数',
        `login_last_1d_count` BIGINT COMMENT '最近1日用户访问次数',
        `visit_last_7d_count` BIGINT COMMENT '最近7访客访问次数',
        `login_last_7d_count` BIGINT COMMENT '最近7日用户访问次数',
        `visit_last_30d_count` BIGINT COMMENT '最近30日访客访问次数',
        `login_last_30d_count` BIGINT COMMENT '最近30日用户访问次数',
        `visit_count` BIGINT COMMENT '累积访客访问次数',
        `login_count` BIGINT COMMENT '累积用户访问次数',
        `order_last_1d_count` BIGINT COMMENT '最近1天下单次数',
        `order_last_1d_original_amount` DECIMAL(16,2) COMMENT '最近1天下单原始金额',
        `order_last_1d_final_amount` DECIMAL(16,2) COMMENT '最近1天下单最终金额',
        `order_last_7d_count` BIGINT COMMENT '最近7天下单次数',
        `order_last_7d_original_amount` DECIMAL(16,2) COMMENT '最近7天下单原始金额',
        `order_last_7d_final_amount` DECIMAL(16,2) COMMENT '最近7天下单最终金额',
        `order_last_30d_count` BIGINT COMMENT '最近30天下单次数',
        `order_last_30d_original_amount` DECIMAL(16,2) COMMENT '最近30天下单原始金额',
        `order_last_30d_final_amount` DECIMAL(16,2) COMMENT '最近30天下单最终金额',
        `order_count` BIGINT COMMENT '累积下单次数',
        `order_original_amount` DECIMAL(16,2) COMMENT '累积下单原始金额',
        `order_final_amount` DECIMAL(16,2) COMMENT '累积下单最终金额',
        `payment_last_1d_count` BIGINT COMMENT '最近1天支付次数',
        `payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1天支付金额',
        `payment_last_7d_count` BIGINT COMMENT '最近7天支付次数',
        `payment_last_7d_amount` DECIMAL(16,2) COMMENT '最近7天支付金额',
        `payment_last_30d_count` BIGINT COMMENT '最近30天支付次数',
        `payment_last_30d_amount` DECIMAL(16,2) COMMENT '最近30天支付金额',
        `payment_count` BIGINT COMMENT '累积支付次数',
        `payment_amount` DECIMAL(16,2) COMMENT '累积支付金额',
        `refund_order_last_1d_count` BIGINT COMMENT '最近1天退单次数',
        `refund_order_last_1d_amount` DECIMAL(16,2) COMMENT '最近1天退单金额',
        `refund_order_last_7d_count` BIGINT COMMENT '最近7天退单次数',
        `refund_order_last_7d_amount` DECIMAL(16,2) COMMENT '最近7天退单金额',
        `refund_order_last_30d_count` BIGINT COMMENT '最近30天退单次数',
        `refund_order_last_30d_amount` DECIMAL(16,2) COMMENT '最近30天退单金额',
        `refund_order_count` BIGINT COMMENT '累积退单次数',
        `refund_order_amount` DECIMAL(16,2) COMMENT '累积退单金额',
        `refund_payment_last_1d_count` BIGINT COMMENT '最近1天退款次数',
        `refund_payment_last_1d_amount` DECIMAL(16,2) COMMENT '最近1天退款金额',
        `refund_payment_last_7d_count` BIGINT COMMENT '最近7天退款次数',
        `refund_payment_last_7d_amount` DECIMAL(16,2) COMMENT '最近7天退款金额',
        `refund_payment_last_30d_count` BIGINT COMMENT '最近30天退款次数',
        `refund_payment_last_30d_amount` DECIMAL(16,2) COMMENT '最近30天退款金额',
        `refund_payment_count` BIGINT COMMENT '累积退款次数',
        `refund_payment_amount` DECIMAL(16,2) COMMENT '累积退款金额'
    ) COMMENT '地区主题宽表'
    PARTITIONED BY (`dt` STRING)
    STORED AS PARQUET
    LOCATION '/warehouse/gmall/dwt/dwt_area_topic/'
    TBLPROPERTIES ("parquet.compression"="lzo");
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
  2. 数据装载

    1. 首日装载

      insert overwrite table dwt_area_topic partition(dt='2020-06-14')
      select
          id,
          nvl(visit_last_1d_count,0),
          nvl(login_last_1d_count,0),
          nvl(visit_last_7d_count,0),
          nvl(login_last_7d_count,0),
          nvl(visit_last_30d_count,0),
          nvl(login_last_30d_count,0),
          nvl(visit_count,0),
          nvl(login_count,0),
          nvl(order_last_1d_count,0),
          nvl(order_last_1d_original_amount,0),
          nvl(order_last_1d_final_amount,0),
          nvl(order_last_7d_count,0),
          nvl(order_last_7d_original_amount,0),
          nvl(order_last_7d_final_amount,0),
          nvl(order_last_30d_count,0),
          nvl(order_last_30d_original_amount,0),
          nvl(order_last_30d_final_amount,0),
          nvl(order_count,0),
          nvl(order_original_amount,0),
          nvl(order_final_amount,0),
          nvl(payment_last_1d_count,0),
          nvl(payment_last_1d_amount,0),
          nvl(payment_last_7d_count,0),
          nvl(payment_last_7d_amount,0),
          nvl(payment_last_30d_count,0),
          nvl(payment_last_30d_amount,0),
          nvl(payment_count,0),
          nvl(payment_amount,0),
          nvl(refund_order_last_1d_count,0),
          nvl(refund_order_last_1d_amount,0),
          nvl(refund_order_last_7d_count,0),
          nvl(refund_order_last_7d_amount,0),
          nvl(refund_order_last_30d_count,0),
          nvl(refund_order_last_30d_amount,0),
          nvl(refund_order_count,0),
          nvl(refund_order_amount,0),
          nvl(refund_payment_last_1d_count,0),
          nvl(refund_payment_last_1d_amount,0),
          nvl(refund_payment_last_7d_count,0),
          nvl(refund_payment_last_7d_amount,0),
          nvl(refund_payment_last_30d_count,0),
          nvl(refund_payment_last_30d_amount,0),
          nvl(refund_payment_count,0),
          nvl(refund_payment_amount,0)
      from
      (
          select
              id
          from dim_base_province
      )t1
      left join
      (
          select
              province_id province_id,
              sum(if(dt='2020-06-14',visit_count,0)) visit_last_1d_count,
              sum(if(dt='2020-06-14',login_count,0)) login_last_1d_count,
              sum(if(dt>=date_add('2020-06-14',-6),visit_count,0)) visit_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),login_count,0)) login_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-29),visit_count,0)) visit_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),login_count,0)) login_last_30d_count,
              sum(visit_count) visit_count,
              sum(login_count) login_count,
              sum(if(dt='2020-06-14',order_count,0)) order_last_1d_count,
              sum(if(dt='2020-06-14',order_original_amount,0)) order_last_1d_original_amount,
              sum(if(dt='2020-06-14',order_final_amount,0)) order_last_1d_final_amount,
              sum(if(dt>=date_add('2020-06-14',-6),order_count,0)) order_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),order_original_amount,0)) order_last_7d_original_amount,
              sum(if(dt>=date_add('2020-06-14',-6),order_final_amount,0)) order_last_7d_final_amount,
              sum(if(dt>=date_add('2020-06-14',-29),order_count,0)) order_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),order_original_amount,0)) order_last_30d_original_amount,
              sum(if(dt>=date_add('2020-06-14',-29),order_final_amount,0)) order_last_30d_final_amount,
              sum(order_count) order_count,
              sum(order_original_amount) order_original_amount,
              sum(order_final_amount) order_final_amount,
              sum(if(dt='2020-06-14',payment_count,0)) payment_last_1d_count,
              sum(if(dt='2020-06-14',payment_amount,0)) payment_last_1d_amount,
              sum(if(dt>=date_add('2020-06-14',-6),payment_count,0)) payment_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),payment_amount,0)) payment_last_7d_amount,
              sum(if(dt>=date_add('2020-06-14',-29),payment_count,0)) payment_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),payment_amount,0)) payment_last_30d_amount,
              sum(payment_count) payment_count,
              sum(payment_amount) payment_amount,
              sum(if(dt='2020-06-14',refund_order_count,0)) refund_order_last_1d_count,
              sum(if(dt='2020-06-14',refund_order_amount,0)) refund_order_last_1d_amount,
              sum(if(dt>=date_add('2020-06-14',-6),refund_order_count,0)) refund_order_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),refund_order_amount,0)) refund_order_last_7d_amount,
              sum(if(dt>=date_add('2020-06-14',-29),refund_order_count,0)) refund_order_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),refund_order_amount,0)) refund_order_last_30d_amount,
              sum(refund_order_count) refund_order_count,
              sum(refund_order_amount) refund_order_amount,
              sum(if(dt='2020-06-14',refund_payment_count,0)) refund_payment_last_1d_count,
              sum(if(dt='2020-06-14',refund_payment_amount,0)) refund_payment_last_1d_amount,
              sum(if(dt>=date_add('2020-06-14',-6),refund_payment_count,0)) refund_payment_last_7d_count,
              sum(if(dt>=date_add('2020-06-14',-6),refund_payment_amount,0)) refund_payment_last_7d_amount,
              sum(if(dt>=date_add('2020-06-14',-29),refund_payment_count,0)) refund_payment_last_30d_count,
              sum(if(dt>=date_add('2020-06-14',-29),refund_payment_amount,0)) refund_payment_last_30d_amount,
              sum(refund_payment_count) refund_payment_count,
              sum(refund_payment_amount) refund_payment_amount
          from dws_area_stats_daycount
          group by province_id
      )t2
      on t1.id=t2.province_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
    2. 每日装载

      insert overwrite table dwt_area_topic partition(dt='2020-06-15')
      select
          nvl(old.province_id, 1d_ago.province_id),
          nvl(1d_ago.visit_count,0),
          nvl(1d_ago.login_count,0),
          nvl(old.visit_last_7d_count,0)+nvl(1d_ago.visit_count,0)- nvl(7d_ago.visit_count,0),
          nvl(old.login_last_7d_count,0)+nvl(1d_ago.login_count,0)- nvl(7d_ago.login_count,0),
          nvl(old.visit_last_30d_count,0)+nvl(1d_ago.visit_count,0)- nvl(30d_ago.visit_count,0),
          nvl(old.login_last_30d_count,0)+nvl(1d_ago.login_count,0)- nvl(30d_ago.login_count,0),
          nvl(old.visit_count,0)+nvl(1d_ago.visit_count,0),
          nvl(old.login_count,0)+nvl(1d_ago.login_count,0),
          nvl(1d_ago.order_count,0),
          nvl(1d_ago.order_original_amount,0.0),
          nvl(1d_ago.order_final_amount,0.0),
          nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0),
          nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0),
          nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0),
          nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0),
          nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0),
          nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0),
          nvl(old.order_count,0)+nvl(1d_ago.order_count,0),
          nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0),
          nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0),
          nvl(1d_ago.payment_count,0),
          nvl(1d_ago.payment_amount,0.0),
          nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)- nvl(7d_ago.payment_count,0),
          nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(7d_ago.payment_amount,0.0),
          nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)- nvl(30d_ago.payment_count,0),
          nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0),
          nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0),
          nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0),
          nvl(1d_ago.refund_order_count,0),
          nvl(1d_ago.refund_order_amount,0.0),
          nvl(old.refund_order_last_7d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(7d_ago.refund_order_count,0),
          nvl(old.refund_order_last_7d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(7d_ago.refund_order_amount,0.0),
          nvl(old.refund_order_last_30d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(30d_ago.refund_order_count,0),
          nvl(old.refund_order_last_30d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(30d_ago.refund_order_amount,0.0),
          nvl(old.refund_order_count,0)+nvl(1d_ago.refund_order_count,0),
          nvl(old.refund_order_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0),
          nvl(1d_ago.refund_payment_count,0),
          nvl(1d_ago.refund_payment_amount,0.0),
          nvl(old.refund_payment_last_7d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(7d_ago.refund_payment_count,0),
          nvl(old.refund_payment_last_7d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(7d_ago.refund_payment_amount,0.0),
          nvl(old.refund_payment_last_30d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(30d_ago.refund_payment_count,0),
          nvl(old.refund_payment_last_30d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(30d_ago.refund_payment_amount,0.0),
          nvl(old.refund_payment_count,0)+nvl(1d_ago.refund_payment_count,0),
          nvl(old.refund_payment_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)
      
      from
      (
          select
              province_id,
              visit_last_1d_count,
              login_last_1d_count,
              visit_last_7d_count,
              login_last_7d_count,
              visit_last_30d_count,
              login_last_30d_count,
              visit_count,
              login_count,
              order_last_1d_count,
              order_last_1d_original_amount,
              order_last_1d_final_amount,
              order_last_7d_count,
              order_last_7d_original_amount,
              order_last_7d_final_amount,
              order_last_30d_count,
              order_last_30d_original_amount,
              order_last_30d_final_amount,
              order_count,
              order_original_amount,
              order_final_amount,
              payment_last_1d_count,
              payment_last_1d_amount,
              payment_last_7d_count,
              payment_last_7d_amount,
              payment_last_30d_count,
              payment_last_30d_amount,
              payment_count,
              payment_amount,
              refund_order_last_1d_count,
              refund_order_last_1d_amount,
              refund_order_last_7d_count,
              refund_order_last_7d_amount,
              refund_order_last_30d_count,
              refund_order_last_30d_amount,
              refund_order_count,
              refund_order_amount,
              refund_payment_last_1d_count,
              refund_payment_last_1d_amount,
              refund_payment_last_7d_count,
              refund_payment_last_7d_amount,
              refund_payment_last_30d_count,
              refund_payment_last_30d_amount,
              refund_payment_count,
              refund_payment_amount
          from dwt_area_topic
          where dt=date_add('2020-06-15',-1)
      )old
      full outer join
      (
          select
              province_id,
              visit_count,
              login_count,
              order_count,
              order_original_amount,
              order_final_amount,
              payment_count,
              payment_amount,
              refund_order_count,
              refund_order_amount,
              refund_payment_count,
              refund_payment_amount
          from dws_area_stats_daycount
          where dt='2020-06-15'
      )1d_ago
      on old.province_id=1d_ago.province_id
      left join
      (
          select
              province_id,
              visit_count,
              login_count,
              order_count,
              order_original_amount,
              order_final_amount,
              payment_count,
              payment_amount,
              refund_order_count,
              refund_order_amount,
              refund_payment_count,
              refund_payment_amount
          from dws_area_stats_daycount
          where dt=date_add('2020-06-15',-7)
      )7d_ago
      on old.province_id= 7d_ago.province_id
      left join
      (
          select
              province_id,
              visit_count,
              login_count,
              order_count,
              order_original_amount,
              order_final_amount,
              payment_count,
              payment_amount,
              refund_order_count,
              refund_order_amount,
              refund_payment_count,
              refund_payment_amount
          from dws_area_stats_daycount
          where dt=date_add('2020-06-15',-30)
      )30d_ago
      on old.province_id= 30d_ago.province_id;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88
      89
      90
      91
      92
      93
      94
      95
      96
      97
      98
      99
      100
      101
      102
      103
      104
      105
      106
      107
      108
      109
      110
      111
      112
      113
      114
      115
      116
      117
      118
      119
      120
      121
      122
      123
      124
      125
      126
      127
      128
      129
      130
      131
      132
      133
      134
      135
      136
      137
      138
      139
      140
      141
      142
      143
      144
      145
      146
      147
      148
      149
      150
      151
      152
      153
      154
      155
      156

# DWT层首日数据导入脚本

  1. 在/home/damoncai/bin目录下创建脚本dws_to_dwt_init.sh

    #!/bin/bash
    
    APP=gmall
    
    if [ -n "$2" ] ;then
       do_date=$2
    else 
       echo "请传入日期参数"
       exit
    fi 
    
    dwt_visitor_topic="
    insert overwrite table ${APP}.dwt_visitor_topic partition(dt='$do_date')
    select
        nvl(1d_ago.mid_id,old.mid_id),
        nvl(1d_ago.brand,old.brand),
        nvl(1d_ago.model,old.model),
        nvl(1d_ago.channel,old.channel),
        nvl(1d_ago.os,old.os),
        nvl(1d_ago.area_code,old.area_code),
        nvl(1d_ago.version_code,old.version_code),
        case when old.mid_id is null and 1d_ago.is_new=1 then '$do_date'
             when old.mid_id is null and 1d_ago.is_new=0 then '2020-06-13'--无法获取准确的首次登录日期,给定一个数仓搭建日之前的日期
             else old.visit_date_first end,
        if(1d_ago.mid_id is not null,'$do_date',old.visit_date_last),
        nvl(1d_ago.visit_count,0),
        if(1d_ago.mid_id is null,0,1),
        nvl(old.visit_last_7d_count,0)+nvl(1d_ago.visit_count,0)- nvl(7d_ago.visit_count,0),
        nvl(old.visit_last_7d_day_count,0)+if(1d_ago.mid_id is null,0,1)- if(7d_ago.mid_id is null,0,1),
        nvl(old.visit_last_30d_count,0)+nvl(1d_ago.visit_count,0)- nvl(30d_ago.visit_count,0),
        nvl(old.visit_last_30d_day_count,0)+if(1d_ago.mid_id is null,0,1)- if(30d_ago.mid_id is null,0,1),
        nvl(old.visit_count,0)+nvl(1d_ago.visit_count,0),
        nvl(old.visit_day_count,0)+if(1d_ago.mid_id is null,0,1)
    from
    (
        select
            mid_id,
            brand,
            model,
            channel,
            os,
            area_code,
            version_code,
            visit_date_first,
            visit_date_last,
            visit_last_1d_count,
            visit_last_1d_day_count,
            visit_last_7d_count,
            visit_last_7d_day_count,
            visit_last_30d_count,
            visit_last_30d_day_count,
            visit_count,
            visit_day_count
        from ${APP}.dwt_visitor_topic
        where dt=date_add('$do_date',-1)
    )old
    full outer join
    (
        select
            mid_id,
            brand,
            model,
            is_new,
            channel,
            os,
            area_code,
            version_code,
            visit_count
        from ${APP}.dws_visitor_action_daycount
        where dt='$do_date'
    )1d_ago
    on old.mid_id=1d_ago.mid_id
    left join
    (
        select
            mid_id,
            brand,
            model,
            is_new,
            channel,
            os,
            area_code,
            version_code,
            visit_count
        from ${APP}.dws_visitor_action_daycount
        where dt=date_add('$do_date',-7)
    )7d_ago
    on old.mid_id=7d_ago.mid_id
    left join
    (
        select
            mid_id,
            brand,
            model,
            is_new,
            channel,
            os,
            area_code,
            version_code,
            visit_count
        from ${APP}.dws_visitor_action_daycount
        where dt=date_add('$do_date',-30)
    )30d_ago
    on old.mid_id=30d_ago.mid_id;
    "
    
    dwt_user_topic="
    insert overwrite table ${APP}.dwt_user_topic partition(dt='$do_date')
    select
        id,
        login_date_first,--以用户的创建日期作为首次登录日期
        nvl(login_date_last,date_add('$do_date',-1)),--若有历史登录记录,则根据历史记录获取末次登录日期,否则统一指定一个日期
        nvl(login_last_1d_count,0),
        nvl(login_last_1d_day_count,0),
        nvl(login_last_7d_count,0),
        nvl(login_last_7d_day_count,0),
        nvl(login_last_30d_count,0),
        nvl(login_last_30d_day_count,0),
        nvl(login_count,0),
        nvl(login_day_count,0),
        order_date_first,
        order_date_last,
        nvl(order_last_1d_count,0),
        nvl(order_activity_last_1d_count,0),
        nvl(order_activity_reduce_last_1d_amount,0),
        nvl(order_coupon_last_1d_count,0),
        nvl(order_coupon_reduce_last_1d_amount,0),
        nvl(order_last_1d_original_amount,0),
        nvl(order_last_1d_final_amount,0),
        nvl(order_last_7d_count,0),
        nvl(order_activity_last_7d_count,0),
        nvl(order_activity_reduce_last_7d_amount,0),
        nvl(order_coupon_last_7d_count,0),
        nvl(order_coupon_reduce_last_7d_amount,0),
        nvl(order_last_7d_original_amount,0),
        nvl(order_last_7d_final_amount,0),
        nvl(order_last_30d_count,0),
        nvl(order_activity_last_30d_count,0),
        nvl(order_activity_reduce_last_30d_amount,0),
        nvl(order_coupon_last_30d_count,0),
        nvl(order_coupon_reduce_last_30d_amount,0),
        nvl(order_last_30d_original_amount,0),
        nvl(order_last_30d_final_amount,0),
        nvl(order_count,0),
        nvl(order_activity_count,0),
        nvl(order_activity_reduce_amount,0),
        nvl(order_coupon_count,0),
        nvl(order_coupon_reduce_amount,0),
        nvl(order_original_amount,0),
        nvl(order_final_amount,0),
        payment_date_first,
        payment_date_last,
        nvl(payment_last_1d_count,0),
        nvl(payment_last_1d_amount,0),
        nvl(payment_last_7d_count,0),
        nvl(payment_last_7d_amount,0),
        nvl(payment_last_30d_count,0),
        nvl(payment_last_30d_amount,0),
        nvl(payment_count,0),
        nvl(payment_amount,0),
        nvl(refund_order_last_1d_count,0),
        nvl(refund_order_last_1d_num,0),
        nvl(refund_order_last_1d_amount,0),
        nvl(refund_order_last_7d_count,0),
        nvl(refund_order_last_7d_num,0),
        nvl(refund_order_last_7d_amount,0),
        nvl(refund_order_last_30d_count,0),
        nvl(refund_order_last_30d_num,0),
        nvl(refund_order_last_30d_amount,0),
        nvl(refund_order_count,0),
        nvl(refund_order_num,0),
        nvl(refund_order_amount,0),
        nvl(refund_payment_last_1d_count,0),
        nvl(refund_payment_last_1d_num,0),
        nvl(refund_payment_last_1d_amount,0),
        nvl(refund_payment_last_7d_count,0),
        nvl(refund_payment_last_7d_num,0),
        nvl(refund_payment_last_7d_amount,0),
        nvl(refund_payment_last_30d_count,0),
        nvl(refund_payment_last_30d_num,0),
        nvl(refund_payment_last_30d_amount,0),
        nvl(refund_payment_count,0),
        nvl(refund_payment_num,0),
        nvl(refund_payment_amount,0),
        nvl(cart_last_1d_count,0),
        nvl(cart_last_7d_count,0),
        nvl(cart_last_30d_count,0),
        nvl(cart_count,0),
        nvl(favor_last_1d_count,0),
        nvl(favor_last_7d_count,0),
        nvl(favor_last_30d_count,0),
        nvl(favor_count,0),
        nvl(coupon_last_1d_get_count,0),
        nvl(coupon_last_1d_using_count,0),
        nvl(coupon_last_1d_used_count,0),
        nvl(coupon_last_7d_get_count,0),
        nvl(coupon_last_7d_using_count,0),
        nvl(coupon_last_7d_used_count,0),
        nvl(coupon_last_30d_get_count,0),
        nvl(coupon_last_30d_using_count,0),
        nvl(coupon_last_30d_used_count,0),
        nvl(coupon_get_count,0),
        nvl(coupon_using_count,0),
        nvl(coupon_used_count,0),
        nvl(appraise_last_1d_good_count,0),
        nvl(appraise_last_1d_mid_count,0),
        nvl(appraise_last_1d_bad_count,0),
        nvl(appraise_last_1d_default_count,0),
        nvl(appraise_last_7d_good_count,0),
        nvl(appraise_last_7d_mid_count,0),
        nvl(appraise_last_7d_bad_count,0),
        nvl(appraise_last_7d_default_count,0),
        nvl(appraise_last_30d_good_count,0),
        nvl(appraise_last_30d_mid_count,0),
        nvl(appraise_last_30d_bad_count,0),
        nvl(appraise_last_30d_default_count,0),
        nvl(appraise_good_count,0),
        nvl(appraise_mid_count,0),
        nvl(appraise_bad_count,0),
        nvl(appraise_default_count,0)
    from
    (
        select
            id,
            date_format(create_time,'yyyy-MM-dd') login_date_first
        from ${APP}.dim_user_info
        where dt='9999-99-99'
    )t1
    left join
    (
        select
            user_id user_id,
            max(dt) login_date_last,
            sum(if(dt='$do_date',login_count,0)) login_last_1d_count,
            sum(if(dt='$do_date' and login_count>0,1,0)) login_last_1d_day_count,
            sum(if(dt>=date_add('$do_date',-6),login_count,0)) login_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6) and login_count>0,1,0)) login_last_7d_day_count,
            sum(if(dt>=date_add('$do_date',-29),login_count,0)) login_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29) and login_count>0,1,0)) login_last_30d_day_count,
            sum(login_count) login_count,
            sum(if(login_count>0,1,0)) login_day_count,
            min(if(order_count>0,dt,null)) order_date_first,
            max(if(order_count>0,dt,null)) order_date_last,
            sum(if(dt='$do_date',order_count,0)) order_last_1d_count,
            sum(if(dt='$do_date',order_activity_count,0)) order_activity_last_1d_count,
            sum(if(dt='$do_date',order_activity_reduce_amount,0)) order_activity_reduce_last_1d_amount,
            sum(if(dt='$do_date',order_coupon_count,0)) order_coupon_last_1d_count,
            sum(if(dt='$do_date',order_coupon_reduce_amount,0)) order_coupon_reduce_last_1d_amount,
            sum(if(dt='$do_date',order_original_amount,0)) order_last_1d_original_amount,
            sum(if(dt='$do_date',order_final_amount,0)) order_last_1d_final_amount,
            sum(if(dt>=date_add('$do_date',-6),order_count,0)) order_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),order_activity_count,0)) order_activity_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),order_activity_reduce_amount,0)) order_activity_reduce_last_7d_amount,
            sum(if(dt>=date_add('$do_date',-6),order_coupon_count,0)) order_coupon_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),order_coupon_reduce_amount,0)) order_coupon_reduce_last_7d_amount,
            sum(if(dt>=date_add('$do_date',-6),order_original_amount,0)) order_last_7d_original_amount,
            sum(if(dt>=date_add('$do_date',-6),order_final_amount,0)) order_last_7d_final_amount,
            sum(if(dt>=date_add('$do_date',-29),order_count,0)) order_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),order_activity_count,0)) order_activity_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),order_activity_reduce_amount,0)) order_activity_reduce_last_30d_amount,
            sum(if(dt>=date_add('$do_date',-29),order_coupon_count,0)) order_coupon_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),order_coupon_reduce_amount,0)) order_coupon_reduce_last_30d_amount,
            sum(if(dt>=date_add('$do_date',-29),order_original_amount,0)) order_last_30d_original_amount,
            sum(if(dt>=date_add('$do_date',-29),order_final_amount,0)) order_last_30d_final_amount,
            sum(order_count) order_count,
            sum(order_activity_count) order_activity_count,
            sum(order_activity_reduce_amount) order_activity_reduce_amount,
            sum(order_coupon_count) order_coupon_count,
            sum(order_coupon_reduce_amount) order_coupon_reduce_amount,
            sum(order_original_amount) order_original_amount,
            sum(order_final_amount) order_final_amount,
            min(if(payment_count>0,dt,null)) payment_date_first,
            max(if(payment_count>0,dt,null)) payment_date_last,
            sum(if(dt='$do_date',payment_count,0)) payment_last_1d_count,
            sum(if(dt='$do_date',payment_amount,0)) payment_last_1d_amount,
            sum(if(dt>=date_add('$do_date',-6),payment_count,0)) payment_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),payment_amount,0)) payment_last_7d_amount,
            sum(if(dt>=date_add('$do_date',-29),payment_count,0)) payment_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),payment_amount,0)) payment_last_30d_amount,
            sum(payment_count) payment_count,
            sum(payment_amount) payment_amount,
            sum(if(dt='$do_date',refund_order_count,0)) refund_order_last_1d_count,
            sum(if(dt='$do_date',refund_order_num,0)) refund_order_last_1d_num,
            sum(if(dt='$do_date',refund_order_amount,0)) refund_order_last_1d_amount,
            sum(if(dt>=date_add('$do_date',-6),refund_order_count,0)) refund_order_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),refund_order_num,0)) refund_order_last_7d_num,
            sum(if(dt>=date_add('$do_date',-6),refund_order_amount,0)) refund_order_last_7d_amount,
            sum(if(dt>=date_add('$do_date',-29),refund_order_count,0)) refund_order_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),refund_order_num,0)) refund_order_last_30d_num,
            sum(if(dt>=date_add('$do_date',-29),refund_order_amount,0)) refund_order_last_30d_amount,
            sum(refund_order_count) refund_order_count,
            sum(refund_order_num) refund_order_num,
            sum(refund_order_amount) refund_order_amount,
            sum(if(dt='$do_date',refund_payment_count,0)) refund_payment_last_1d_count,
            sum(if(dt='$do_date',refund_payment_num,0)) refund_payment_last_1d_num,
            sum(if(dt='$do_date',refund_payment_amount,0)) refund_payment_last_1d_amount,
            sum(if(dt>=date_add('$do_date',-6),refund_payment_count,0)) refund_payment_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),refund_payment_num,0)) refund_payment_last_7d_num,
            sum(if(dt>=date_add('$do_date',-6),refund_payment_amount,0)) refund_payment_last_7d_amount,
            sum(if(dt>=date_add('$do_date',-29),refund_payment_count,0)) refund_payment_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),refund_payment_num,0)) refund_payment_last_30d_num,
            sum(if(dt>=date_add('$do_date',-29),refund_payment_amount,0)) refund_payment_last_30d_amount,
            sum(refund_payment_count) refund_payment_count,
            sum(refund_payment_num) refund_payment_num,
            sum(refund_payment_amount) refund_payment_amount,
            sum(if(dt='$do_date',cart_count,0)) cart_last_1d_count,
            sum(if(dt>=date_add('$do_date',-6),cart_count,0)) cart_last_7d_count,
            sum(if(dt>=date_add('$do_date',-29),cart_count,0)) cart_last_30d_count,
            sum(cart_count) cart_count,
            sum(if(dt='$do_date',favor_count,0)) favor_last_1d_count,
            sum(if(dt>=date_add('$do_date',-6),favor_count,0)) favor_last_7d_count,
            sum(if(dt>=date_add('$do_date',-29),favor_count,0)) favor_last_30d_count,
            sum(favor_count) favor_count,
            sum(if(dt='$do_date',coupon_get_count,0)) coupon_last_1d_get_count,
            sum(if(dt='$do_date',coupon_using_count,0)) coupon_last_1d_using_count,
            sum(if(dt='$do_date',coupon_used_count,0)) coupon_last_1d_used_count,
            sum(if(dt>=date_add('$do_date',-6),coupon_get_count,0)) coupon_last_7d_get_count,
            sum(if(dt>=date_add('$do_date',-6),coupon_using_count,0)) coupon_last_7d_using_count,
            sum(if(dt>=date_add('$do_date',-6),coupon_used_count,0)) coupon_last_7d_used_count,
            sum(if(dt>=date_add('$do_date',-29),coupon_get_count,0)) coupon_last_30d_get_count,
            sum(if(dt>=date_add('$do_date',-29),coupon_using_count,0)) coupon_last_30d_using_count,
            sum(if(dt>=date_add('$do_date',-29),coupon_used_count,0)) coupon_last_30d_used_count,
            sum(coupon_get_count) coupon_get_count,
            sum(coupon_using_count) coupon_using_count,
            sum(coupon_used_count) coupon_used_count,
            sum(if(dt='$do_date',appraise_good_count,0)) appraise_last_1d_good_count,
            sum(if(dt='$do_date',appraise_mid_count,0)) appraise_last_1d_mid_count,
            sum(if(dt='$do_date',appraise_bad_count,0)) appraise_last_1d_bad_count,
            sum(if(dt='$do_date',appraise_default_count,0)) appraise_last_1d_default_count,
            sum(if(dt>=date_add('$do_date',-6),appraise_good_count,0)) appraise_last_7d_good_count,
            sum(if(dt>=date_add('$do_date',-6),appraise_mid_count,0)) appraise_last_7d_mid_count,
            sum(if(dt>=date_add('$do_date',-6),appraise_bad_count,0)) appraise_last_7d_bad_count,
            sum(if(dt>=date_add('$do_date',-6),appraise_default_count,0)) appraise_last_7d_default_count,
            sum(if(dt>=date_add('$do_date',-29),appraise_good_count,0)) appraise_last_30d_good_count,
            sum(if(dt>=date_add('$do_date',-29),appraise_mid_count,0)) appraise_last_30d_mid_count,
            sum(if(dt>=date_add('$do_date',-29),appraise_bad_count,0)) appraise_last_30d_bad_count,
            sum(if(dt>=date_add('$do_date',-29),appraise_default_count,0)) appraise_last_30d_default_count,
            sum(appraise_good_count) appraise_good_count,
            sum(appraise_mid_count) appraise_mid_count,
            sum(appraise_bad_count) appraise_bad_count,
            sum(appraise_default_count) appraise_default_count
        from ${APP}.dws_user_action_daycount
        group by user_id
    )t2
    on t1.id=t2.user_id;
    "
    
    dwt_sku_topic="
    insert overwrite table ${APP}.dwt_sku_topic partition(dt='$do_date')
    select
        id,
        nvl(order_last_1d_count,0),
        nvl(order_last_1d_num,0),
        nvl(order_activity_last_1d_count,0),
        nvl(order_coupon_last_1d_count,0),
        nvl(order_activity_reduce_last_1d_amount,0),
        nvl(order_coupon_reduce_last_1d_amount,0),
        nvl(order_last_1d_original_amount,0),
        nvl(order_last_1d_final_amount,0),
        nvl(order_last_7d_count,0),
        nvl(order_last_7d_num,0),
        nvl(order_activity_last_7d_count,0),
        nvl(order_coupon_last_7d_count,0),
        nvl(order_activity_reduce_last_7d_amount,0),
        nvl(order_coupon_reduce_last_7d_amount,0),
        nvl(order_last_7d_original_amount,0),
        nvl(order_last_7d_final_amount,0),
        nvl(order_last_30d_count,0),
        nvl(order_last_30d_num,0),
        nvl(order_activity_last_30d_count,0),
        nvl(order_coupon_last_30d_count,0),
        nvl(order_activity_reduce_last_30d_amount,0),
        nvl(order_coupon_reduce_last_30d_amount,0),
        nvl(order_last_30d_original_amount,0),
        nvl(order_last_30d_final_amount,0),
        nvl(order_count,0),
        nvl(order_num,0),
        nvl(order_activity_count,0),
        nvl(order_coupon_count,0),
        nvl(order_activity_reduce_amount,0),
        nvl(order_coupon_reduce_amount,0),
        nvl(order_original_amount,0),
        nvl(order_final_amount,0),
        nvl(payment_last_1d_count,0),
        nvl(payment_last_1d_num,0),
        nvl(payment_last_1d_amount,0),
        nvl(payment_last_7d_count,0),
        nvl(payment_last_7d_num,0),
        nvl(payment_last_7d_amount,0),
        nvl(payment_last_30d_count,0),
        nvl(payment_last_30d_num,0),
        nvl(payment_last_30d_amount,0),
        nvl(payment_count,0),
        nvl(payment_num,0),
        nvl(payment_amount,0),
        nvl(refund_order_last_1d_count,0),
        nvl(refund_order_last_1d_num,0),
        nvl(refund_order_last_1d_amount,0),
        nvl(refund_order_last_7d_count,0),
        nvl(refund_order_last_7d_num,0),
        nvl(refund_order_last_7d_amount,0),
        nvl(refund_order_last_30d_count,0),
        nvl(refund_order_last_30d_num,0),
        nvl(refund_order_last_30d_amount,0),
        nvl(refund_order_count,0),
        nvl(refund_order_num,0),
        nvl(refund_order_amount,0),
        nvl(refund_payment_last_1d_count,0),
        nvl(refund_payment_last_1d_num,0),
        nvl(refund_payment_last_1d_amount,0),
        nvl(refund_payment_last_7d_count,0),
        nvl(refund_payment_last_7d_num,0),
        nvl(refund_payment_last_7d_amount,0),
        nvl(refund_payment_last_30d_count,0),
        nvl(refund_payment_last_30d_num,0),
        nvl(refund_payment_last_30d_amount,0),
        nvl(refund_payment_count,0),
        nvl(refund_payment_num,0),
        nvl(refund_payment_amount,0),
        nvl(cart_last_1d_count,0),
        nvl(cart_last_7d_count,0),
        nvl(cart_last_30d_count,0),
        nvl(cart_count,0),
        nvl(favor_last_1d_count,0),
        nvl(favor_last_7d_count,0),
        nvl(favor_last_30d_count,0),
        nvl(favor_count,0),
        nvl(appraise_last_1d_good_count,0),
        nvl(appraise_last_1d_mid_count,0),
        nvl(appraise_last_1d_bad_count,0),
        nvl(appraise_last_1d_default_count,0),
        nvl(appraise_last_7d_good_count,0),
        nvl(appraise_last_7d_mid_count,0),
        nvl(appraise_last_7d_bad_count,0),
        nvl(appraise_last_7d_default_count,0),
        nvl(appraise_last_30d_good_count,0),
        nvl(appraise_last_30d_mid_count,0),
        nvl(appraise_last_30d_bad_count,0),
        nvl(appraise_last_30d_default_count,0),
        nvl(appraise_good_count,0),
        nvl(appraise_mid_count,0),
        nvl(appraise_bad_count,0),
        nvl(appraise_default_count,0)
    from
    (
        select
            id
        from ${APP}.dim_sku_info
        where dt='$do_date'
    )t1
    left join
    (
        select
            sku_id,
            sum(if(dt='$do_date',order_count,0)) order_last_1d_count,
            sum(if(dt='$do_date',order_num,0)) order_last_1d_num,
            sum(if(dt='$do_date',order_activity_count,0)) order_activity_last_1d_count,
            sum(if(dt='$do_date',order_coupon_count,0)) order_coupon_last_1d_count,
            sum(if(dt='$do_date',order_activity_reduce_amount,0)) order_activity_reduce_last_1d_amount,
            sum(if(dt='$do_date',order_coupon_reduce_amount,0)) order_coupon_reduce_last_1d_amount,
            sum(if(dt='$do_date',order_original_amount,0)) order_last_1d_original_amount,
            sum(if(dt='$do_date',order_final_amount,0)) order_last_1d_final_amount,
            sum(if(dt>=date_add('$do_date',-6),order_count,0)) order_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),order_num,0)) order_last_7d_num,
            sum(if(dt>=date_add('$do_date',-6),order_activity_count,0)) order_activity_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),order_coupon_count,0)) order_coupon_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),order_activity_reduce_amount,0)) order_activity_reduce_last_7d_amount,
            sum(if(dt>=date_add('$do_date',-6),order_coupon_reduce_amount,0)) order_coupon_reduce_last_7d_amount,
            sum(if(dt>=date_add('$do_date',-6),order_original_amount,0)) order_last_7d_original_amount,
            sum(if(dt>=date_add('$do_date',-6),order_final_amount,0)) order_last_7d_final_amount,
            sum(if(dt>=date_add('$do_date',-29),order_count,0)) order_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),order_num,0)) order_last_30d_num,
            sum(if(dt>=date_add('$do_date',-29),order_activity_count,0)) order_activity_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),order_coupon_count,0)) order_coupon_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),order_activity_reduce_amount,0)) order_activity_reduce_last_30d_amount,
            sum(if(dt>=date_add('$do_date',-29),order_coupon_reduce_amount,0)) order_coupon_reduce_last_30d_amount,
            sum(if(dt>=date_add('$do_date',-29),order_original_amount,0)) order_last_30d_original_amount,
            sum(if(dt>=date_add('$do_date',-29),order_final_amount,0)) order_last_30d_final_amount,
            sum(order_count) order_count,
            sum(order_num) order_num,
            sum(order_activity_count) order_activity_count,
            sum(order_coupon_count) order_coupon_count,
            sum(order_activity_reduce_amount) order_activity_reduce_amount,
            sum(order_coupon_reduce_amount) order_coupon_reduce_amount,
            sum(order_original_amount) order_original_amount,
            sum(order_final_amount) order_final_amount,
            sum(if(dt='$do_date',payment_count,0)) payment_last_1d_count,
            sum(if(dt='$do_date',payment_num,0)) payment_last_1d_num,
            sum(if(dt='$do_date',payment_amount,0)) payment_last_1d_amount,
            sum(if(dt>=date_add('$do_date',-6),payment_count,0)) payment_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),payment_num,0)) payment_last_7d_num,
            sum(if(dt>=date_add('$do_date',-6),payment_amount,0)) payment_last_7d_amount,
            sum(if(dt>=date_add('$do_date',-29),payment_count,0)) payment_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),payment_num,0)) payment_last_30d_num,
            sum(if(dt>=date_add('$do_date',-29),payment_amount,0)) payment_last_30d_amount,
            sum(payment_count) payment_count,
            sum(payment_num) payment_num,
            sum(payment_amount) payment_amount,
            sum(if(dt='$do_date',refund_order_count,0)) refund_order_last_1d_count,
            sum(if(dt='$do_date',refund_order_num,0)) refund_order_last_1d_num,
            sum(if(dt='$do_date',refund_order_amount,0)) refund_order_last_1d_amount,
            sum(if(dt>=date_add('$do_date',-6),refund_order_count,0)) refund_order_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),refund_order_num,0)) refund_order_last_7d_num,
            sum(if(dt>=date_add('$do_date',-6),refund_order_amount,0)) refund_order_last_7d_amount,
            sum(if(dt>=date_add('$do_date',-29),refund_order_count,0)) refund_order_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),refund_order_num,0)) refund_order_last_30d_num,
            sum(if(dt>=date_add('$do_date',-29),refund_order_amount,0)) refund_order_last_30d_amount,
            sum(refund_order_count) refund_order_count,
            sum(refund_order_num) refund_order_num,
            sum(refund_order_amount) refund_order_amount,
            sum(if(dt='$do_date',refund_payment_count,0)) refund_payment_last_1d_count,
            sum(if(dt='$do_date',refund_payment_num,0)) refund_payment_last_1d_num,
            sum(if(dt='$do_date',refund_payment_amount,0)) refund_payment_last_1d_amount,
            sum(if(dt>=date_add('$do_date',-6),refund_payment_count,0)) refund_payment_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),refund_payment_num,0)) refund_payment_last_7d_num,
            sum(if(dt>=date_add('$do_date',-6),refund_payment_amount,0)) refund_payment_last_7d_amount,
            sum(if(dt>=date_add('$do_date',-29),refund_payment_count,0)) refund_payment_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),refund_payment_num,0)) refund_payment_last_30d_num,
            sum(if(dt>=date_add('$do_date',-29),refund_payment_amount,0)) refund_payment_last_30d_amount,
            sum(refund_payment_count) refund_payment_count,
            sum(refund_payment_num) refund_payment_num,
            sum(refund_payment_amount) refund_payment_amount,
            sum(if(dt='$do_date',cart_count,0)) cart_last_1d_count,
            sum(if(dt>=date_add('$do_date',-6),cart_count,0)) cart_last_7d_count,
            sum(if(dt>=date_add('$do_date',-29),cart_count,0)) cart_last_30d_count,
            sum(cart_count) cart_count,
            sum(if(dt='$do_date',favor_count,0)) favor_last_1d_count,
            sum(if(dt>=date_add('$do_date',-6),favor_count,0)) favor_last_7d_count,
            sum(if(dt>=date_add('$do_date',-29),favor_count,0)) favor_last_30d_count,
            sum(favor_count) favor_count,
            sum(if(dt='$do_date',appraise_good_count,0)) appraise_last_1d_good_count,
            sum(if(dt='$do_date',appraise_mid_count,0)) appraise_last_1d_mid_count,
            sum(if(dt='$do_date',appraise_bad_count,0)) appraise_last_1d_bad_count,
            sum(if(dt='$do_date',appraise_default_count,0)) appraise_last_1d_default_count,
            sum(if(dt>=date_add('$do_date',-6),appraise_good_count,0)) appraise_last_7d_good_count,
            sum(if(dt>=date_add('$do_date',-6),appraise_mid_count,0)) appraise_last_7d_mid_count,
            sum(if(dt>=date_add('$do_date',-6),appraise_bad_count,0)) appraise_last_7d_bad_count,
            sum(if(dt>=date_add('$do_date',-6),appraise_default_count,0)) appraise_last_7d_default_count,
            sum(if(dt>=date_add('$do_date',-29),appraise_good_count,0)) appraise_last_30d_good_count,
            sum(if(dt>=date_add('$do_date',-29),appraise_mid_count,0)) appraise_last_30d_mid_count,
            sum(if(dt>=date_add('$do_date',-29),appraise_bad_count,0)) appraise_last_30d_bad_count,
            sum(if(dt>=date_add('$do_date',-29),appraise_default_count,0)) appraise_last_30d_default_count,
            sum(appraise_good_count) appraise_good_count,
            sum(appraise_mid_count) appraise_mid_count,
            sum(appraise_bad_count) appraise_bad_count,
            sum(appraise_default_count) appraise_default_count
        from ${APP}.dws_sku_action_daycount
        group by sku_id
    )t2
    on t1.id=t2.sku_id;
    "
    
    dwt_coupon_topic="
    insert overwrite table ${APP}.dwt_coupon_topic partition(dt='$do_date')
    select
        id,
        nvl(get_last_1d_count,0),
        nvl(get_last_7d_count,0),
        nvl(get_last_30d_count,0),
        nvl(get_count,0),
        nvl(order_last_1d_count,0),
        nvl(order_last_1d_reduce_amount,0),
        nvl(order_last_1d_original_amount,0),
        nvl(order_last_1d_final_amount,0),
        nvl(order_last_7d_count,0),
        nvl(order_last_7d_reduce_amount,0),
        nvl(order_last_7d_original_amount,0),
        nvl(order_last_7d_final_amount,0),
        nvl(order_last_30d_count,0),
        nvl(order_last_30d_reduce_amount,0),
        nvl(order_last_30d_original_amount,0),
        nvl(order_last_30d_final_amount,0),
        nvl(order_count,0),
        nvl(order_reduce_amount,0),
        nvl(order_original_amount,0),
        nvl(order_final_amount,0),
        nvl(payment_last_1d_count,0),
        nvl(payment_last_1d_reduce_amount,0),
        nvl(payment_last_1d_amount,0),
        nvl(payment_last_7d_count,0),
        nvl(payment_last_7d_reduce_amount,0),
        nvl(payment_last_7d_amount,0),
        nvl(payment_last_30d_count,0),
        nvl(payment_last_30d_reduce_amount,0),
        nvl(payment_last_30d_amount,0),
        nvl(payment_count,0),
        nvl(payment_reduce_amount,0),
        nvl(payment_amount,0),
        nvl(expire_last_1d_count,0),
        nvl(expire_last_7d_count,0),
        nvl(expire_last_30d_count,0),
        nvl(expire_count,0)
    from
    (
        select
            id
        from ${APP}.dim_coupon_info
        where dt='$do_date'
    )t1
    left join
    (
        select
            coupon_id coupon_id,
            sum(if(dt='$do_date',get_count,0)) get_last_1d_count,
            sum(if(dt>=date_add('$do_date',-6),get_count,0)) get_last_7d_count,
            sum(if(dt>=date_add('$do_date',-29),get_count,0)) get_last_30d_count,
            sum(get_count) get_count,
            sum(if(dt='$do_date',order_count,0)) order_last_1d_count,
            sum(if(dt='$do_date',order_reduce_amount,0)) order_last_1d_reduce_amount,
            sum(if(dt='$do_date',order_original_amount,0)) order_last_1d_original_amount,
            sum(if(dt='$do_date',order_final_amount,0)) order_last_1d_final_amount,
            sum(if(dt>=date_add('$do_date',-6),order_count,0)) order_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),order_reduce_amount,0)) order_last_7d_reduce_amount,
            sum(if(dt>=date_add('$do_date',-6),order_original_amount,0)) order_last_7d_original_amount,
            sum(if(dt>=date_add('$do_date',-6),order_final_amount,0)) order_last_7d_final_amount,
            sum(if(dt>=date_add('$do_date',-29),order_count,0)) order_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),order_reduce_amount,0)) order_last_30d_reduce_amount,
            sum(if(dt>=date_add('$do_date',-29),order_original_amount,0)) order_last_30d_original_amount,
            sum(if(dt>=date_add('$do_date',-29),order_final_amount,0)) order_last_30d_final_amount,
            sum(order_count) order_count,
            sum(order_reduce_amount) order_reduce_amount,
            sum(order_original_amount) order_original_amount,
            sum(order_final_amount) order_final_amount,
            sum(if(dt='$do_date',payment_count,0)) payment_last_1d_count,
            sum(if(dt='$do_date',payment_reduce_amount,0)) payment_last_1d_reduce_amount,
            sum(if(dt='$do_date',payment_amount,0)) payment_last_1d_amount,
            sum(if(dt>=date_add('$do_date',-6),payment_count,0)) payment_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),payment_reduce_amount,0)) payment_last_7d_reduce_amount,
            sum(if(dt>=date_add('$do_date',-6),payment_amount,0)) payment_last_7d_amount,
            sum(if(dt>=date_add('$do_date',-29),payment_count,0)) payment_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),payment_reduce_amount,0)) payment_last_30d_reduce_amount,
            sum(if(dt>=date_add('$do_date',-29),payment_amount,0)) payment_last_30d_amount,
            sum(payment_count) payment_count,
            sum(payment_reduce_amount) payment_reduce_amount,
            sum(payment_amount) payment_amount,
            sum(if(dt='$do_date',expire_count,0)) expire_last_1d_count,
            sum(if(dt>=date_add('$do_date',-6),expire_count,0)) expire_last_7d_count,
            sum(if(dt>=date_add('$do_date',-29),expire_count,0)) expire_last_30d_count,
            sum(expire_count) expire_count
        from ${APP}.dws_coupon_info_daycount
        group by coupon_id
    )t2
    on t1.id=t2.coupon_id;
    "
    
    dwt_activity_topic="
    insert overwrite table ${APP}.dwt_activity_topic partition(dt='$do_date')
    select
        t1.activity_rule_id,
        t1.activity_id,
        nvl(order_last_1d_count,0),
        nvl(order_last_1d_reduce_amount,0),
        nvl(order_last_1d_original_amount,0),
        nvl(order_last_1d_final_amount,0),
        nvl(order_count,0),
        nvl(order_reduce_amount,0),
        nvl(order_original_amount,0),
        nvl(order_final_amount,0),
        nvl(payment_last_1d_count,0),
        nvl(payment_last_1d_reduce_amount,0),
        nvl(payment_last_1d_amount,0),
        nvl(payment_count,0),
        nvl(payment_reduce_amount,0),
        nvl(payment_amount,0)
    from
    (
        select
            activity_rule_id,
            activity_id
        from ${APP}.dim_activity_rule_info
        where dt='$do_date'
    )t1
    left join
    (
        select
            activity_rule_id,
            activity_id,
            sum(if(dt='$do_date',order_count,0)) order_last_1d_count,
            sum(if(dt='$do_date',order_reduce_amount,0)) order_last_1d_reduce_amount,
            sum(if(dt='$do_date',order_original_amount,0)) order_last_1d_original_amount,
            sum(if(dt='$do_date',order_final_amount,0)) order_last_1d_final_amount,
            sum(order_count) order_count,
            sum(order_reduce_amount) order_reduce_amount,
            sum(order_original_amount) order_original_amount,
            sum(order_final_amount) order_final_amount,
            sum(if(dt='$do_date',payment_count,0)) payment_last_1d_count,
            sum(if(dt='$do_date',payment_reduce_amount,0)) payment_last_1d_reduce_amount,
            sum(if(dt='$do_date',payment_amount,0)) payment_last_1d_amount,
            sum(payment_count) payment_count,
            sum(payment_reduce_amount) payment_reduce_amount,
            sum(payment_amount) payment_amount
        from ${APP}.dws_activity_info_daycount
        group by activity_rule_id,activity_id
    )t2
    on t1.activity_rule_id=t2.activity_rule_id
    and t1.activity_id=t2.activity_id;
    "
    
    dwt_area_topic="
    insert overwrite table ${APP}.dwt_area_topic partition(dt='$do_date')
    select
        id,
        nvl(visit_last_1d_count,0),
        nvl(login_last_1d_count,0),
        nvl(visit_last_7d_count,0),
        nvl(login_last_7d_count,0),
        nvl(visit_last_30d_count,0),
        nvl(login_last_30d_count,0),
        nvl(visit_count,0),
        nvl(login_count,0),
        nvl(order_last_1d_count,0),
        nvl(order_last_1d_original_amount,0),
        nvl(order_last_1d_final_amount,0),
        nvl(order_last_7d_count,0),
        nvl(order_last_7d_original_amount,0),
        nvl(order_last_7d_final_amount,0),
        nvl(order_last_30d_count,0),
        nvl(order_last_30d_original_amount,0),
        nvl(order_last_30d_final_amount,0),
        nvl(order_count,0),
        nvl(order_original_amount,0),
        nvl(order_final_amount,0),
        nvl(payment_last_1d_count,0),
        nvl(payment_last_1d_amount,0),
        nvl(payment_last_7d_count,0),
        nvl(payment_last_7d_amount,0),
        nvl(payment_last_30d_count,0),
        nvl(payment_last_30d_amount,0),
        nvl(payment_count,0),
        nvl(payment_amount,0),
        nvl(refund_order_last_1d_count,0),
        nvl(refund_order_last_1d_amount,0),
        nvl(refund_order_last_7d_count,0),
        nvl(refund_order_last_7d_amount,0),
        nvl(refund_order_last_30d_count,0),
        nvl(refund_order_last_30d_amount,0),
        nvl(refund_order_count,0),
        nvl(refund_order_amount,0),
        nvl(refund_payment_last_1d_count,0),
        nvl(refund_payment_last_1d_amount,0),
        nvl(refund_payment_last_7d_count,0),
        nvl(refund_payment_last_7d_amount,0),
        nvl(refund_payment_last_30d_count,0),
        nvl(refund_payment_last_30d_amount,0),
        nvl(refund_payment_count,0),
        nvl(refund_payment_amount,0)
    from
    (
        select
            id
        from ${APP}.dim_base_province
    )t1
    left join
    (
        select
            province_id province_id,
            sum(if(dt='$do_date',visit_count,0)) visit_last_1d_count,
            sum(if(dt='$do_date',login_count,0)) login_last_1d_count,
            sum(if(dt>=date_add('$do_date',-6),visit_count,0)) visit_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),login_count,0)) login_last_7d_count,
            sum(if(dt>=date_add('$do_date',-29),visit_count,0)) visit_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),login_count,0)) login_last_30d_count,
            sum(visit_count) visit_count,
            sum(login_count) login_count,
            sum(if(dt='$do_date',order_count,0)) order_last_1d_count,
            sum(if(dt='$do_date',order_original_amount,0)) order_last_1d_original_amount,
            sum(if(dt='$do_date',order_final_amount,0)) order_last_1d_final_amount,
            sum(if(dt>=date_add('$do_date',-6),order_count,0)) order_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),order_original_amount,0)) order_last_7d_original_amount,
            sum(if(dt>=date_add('$do_date',-6),order_final_amount,0)) order_last_7d_final_amount,
            sum(if(dt>=date_add('$do_date',-29),order_count,0)) order_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),order_original_amount,0)) order_last_30d_original_amount,
            sum(if(dt>=date_add('$do_date',-29),order_final_amount,0)) order_last_30d_final_amount,
            sum(order_count) order_count,
            sum(order_original_amount) order_original_amount,
            sum(order_final_amount) order_final_amount,
            sum(if(dt='$do_date',payment_count,0)) payment_last_1d_count,
            sum(if(dt='$do_date',payment_amount,0)) payment_last_1d_amount,
            sum(if(dt>=date_add('$do_date',-6),payment_count,0)) payment_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),payment_amount,0)) payment_last_7d_amount,
            sum(if(dt>=date_add('$do_date',-29),payment_count,0)) payment_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),payment_amount,0)) payment_last_30d_amount,
            sum(payment_count) payment_count,
            sum(payment_amount) payment_amount,
            sum(if(dt='$do_date',refund_order_count,0)) refund_order_last_1d_count,
            sum(if(dt='$do_date',refund_order_amount,0)) refund_order_last_1d_amount,
            sum(if(dt>=date_add('$do_date',-6),refund_order_count,0)) refund_order_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),refund_order_amount,0)) refund_order_last_7d_amount,
            sum(if(dt>=date_add('$do_date',-29),refund_order_count,0)) refund_order_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),refund_order_amount,0)) refund_order_last_30d_amount,
            sum(refund_order_count) refund_order_count,
            sum(refund_order_amount) refund_order_amount,
            sum(if(dt='$do_date',refund_payment_count,0)) refund_payment_last_1d_count,
            sum(if(dt='$do_date',refund_payment_amount,0)) refund_payment_last_1d_amount,
            sum(if(dt>=date_add('$do_date',-6),refund_payment_count,0)) refund_payment_last_7d_count,
            sum(if(dt>=date_add('$do_date',-6),refund_payment_amount,0)) refund_payment_last_7d_amount,
            sum(if(dt>=date_add('$do_date',-29),refund_payment_count,0)) refund_payment_last_30d_count,
            sum(if(dt>=date_add('$do_date',-29),refund_payment_amount,0)) refund_payment_last_30d_amount,
            sum(refund_payment_count) refund_payment_count,
            sum(refund_payment_amount) refund_payment_amount
        from ${APP}.dws_area_stats_daycount
        group by province_id
    )t2
    on t1.id=t2.province_id;
    "
    
    
    case $1 in
        "dwt_visitor_topic" )
            hive -e "$dwt_visitor_topic"
        ;;
        "dwt_user_topic" )
            hive -e "$dwt_user_topic"
        ;;
        "dwt_sku_topic" )
            hive -e "$dwt_sku_topic"
        ;;
        "dwt_activity_topic" )
            hive -e "$dwt_activity_topic"
        ;;
        "dwt_coupon_topic" )
            hive -e "$dwt_coupon_topic"
        ;;
        "dwt_area_topic" )
            hive -e "$dwt_area_topic"
        ;;
        "all" )
            hive -e "$dwt_visitor_topic$dwt_user_topic$dwt_sku_topic$dwt_activity_topic$dwt_coupon_topic$dwt_area_topic"
        ;;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
    318
    319
    320
    321
    322
    323
    324
    325
    326
    327
    328
    329
    330
    331
    332
    333
    334
    335
    336
    337
    338
    339
    340
    341
    342
    343
    344
    345
    346
    347
    348
    349
    350
    351
    352
    353
    354
    355
    356
    357
    358
    359
    360
    361
    362
    363
    364
    365
    366
    367
    368
    369
    370
    371
    372
    373
    374
    375
    376
    377
    378
    379
    380
    381
    382
    383
    384
    385
    386
    387
    388
    389
    390
    391
    392
    393
    394
    395
    396
    397
    398
    399
    400
    401
    402
    403
    404
    405
    406
    407
    408
    409
    410
    411
    412
    413
    414
    415
    416
    417
    418
    419
    420
    421
    422
    423
    424
    425
    426
    427
    428
    429
    430
    431
    432
    433
    434
    435
    436
    437
    438
    439
    440
    441
    442
    443
    444
    445
    446
    447
    448
    449
    450
    451
    452
    453
    454
    455
    456
    457
    458
    459
    460
    461
    462
    463
    464
    465
    466
    467
    468
    469
    470
    471
    472
    473
    474
    475
    476
    477
    478
    479
    480
    481
    482
    483
    484
    485
    486
    487
    488
    489
    490
    491
    492
    493
    494
    495
    496
    497
    498
    499
    500
    501
    502
    503
    504
    505
    506
    507
    508
    509
    510
    511
    512
    513
    514
    515
    516
    517
    518
    519
    520
    521
    522
    523
    524
    525
    526
    527
    528
    529
    530
    531
    532
    533
    534
    535
    536
    537
    538
    539
    540
    541
    542
    543
    544
    545
    546
    547
    548
    549
    550
    551
    552
    553
    554
    555
    556
    557
    558
    559
    560
    561
    562
    563
    564
    565
    566
    567
    568
    569
    570
    571
    572
    573
    574
    575
    576
    577
    578
    579
    580
    581
    582
    583
    584
    585
    586
    587
    588
    589
    590
    591
    592
    593
    594
    595
    596
    597
    598
    599
    600
    601
    602
    603
    604
    605
    606
    607
    608
    609
    610
    611
    612
    613
    614
    615
    616
    617
    618
    619
    620
    621
    622
    623
    624
    625
    626
    627
    628
    629
    630
    631
    632
    633
    634
    635
    636
    637
    638
    639
    640
    641
    642
    643
    644
    645
    646
    647
    648
    649
    650
    651
    652
    653
    654
    655
    656
    657
    658
    659
    660
    661
    662
    663
    664
    665
    666
    667
    668
    669
    670
    671
    672
    673
    674
    675
    676
    677
    678
    679
    680
    681
    682
    683
    684
    685
    686
    687
    688
    689
    690
    691
    692
    693
    694
    695
    696
    697
    698
    699
    700
    701
    702
    703
    704
    705
    706
    707
    708
    709
    710
    711
    712
    713
    714
    715
    716
    717
    718
    719
    720
    721
    722
    723
    724
    725
    726
    727
    728
    729
    730
    731
    732
    733
    734
    735
    736
    737
    738
    739
    740
    741
    742
    743
    744
    745
    746
    747
    748
    749
    750
    751
    752
    753
    754
    755
    756
    757
    758
    759
    760
    761
    762
    763
    764
    765
    766
    767
    768
    769
    770
    771
    772
    773
    774
    775
    776
    777
    778
    779
    780
    781
    782
    783
    784
    785
    786
    787
    788
    789
    790
    791
    792
    793
    794
    795
    796
    797
    798
    799
    800
    801
    802
    803
    804
    805
    806
    807
    808
    809
    810
    811
    812
    813
    814
    815
    816
    817
    818
    819
    820
    821
    822
    823
    824
    825
    826
    827
    828
    829
    830
  2. 添加执行权限

  3. 执行脚本

    dws_to_dwt_init.sh all 2020-06-14
    
    1

# DWT层每日数据导入脚本

  1. 在/home/damoncai/bin目录下创建脚本dws_to_dwt.sh

    #!/bin/bash
    
    APP=gmall
    # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
    if [ -n "$2" ] ;then
        do_date=$2
    else 
        do_date=`date -d "-1 day" +%F`
    fi
    
    clear_date=`date -d "$do_date -2 day" +%F`
    
    dwt_visitor_topic="
    insert overwrite table ${APP}.dwt_visitor_topic partition(dt='$do_date')
    select
        nvl(1d_ago.mid_id,old.mid_id),
        nvl(1d_ago.brand,old.brand),
        nvl(1d_ago.model,old.model),
        nvl(1d_ago.channel,old.channel),
        nvl(1d_ago.os,old.os),
        nvl(1d_ago.area_code,old.area_code),
        nvl(1d_ago.version_code,old.version_code),
        case when old.mid_id is null and 1d_ago.is_new=1 then '$do_date'
             when old.mid_id is null and 1d_ago.is_new=0 then '2020-06-13'--无法获取准确的首次登录日期,给定一个数仓搭建日之前的日期
             else old.visit_date_first end,
        if(1d_ago.mid_id is not null,'$do_date',old.visit_date_last),
        nvl(1d_ago.visit_count,0),
        if(1d_ago.mid_id is null,0,1),
        nvl(old.visit_last_7d_count,0)+nvl(1d_ago.visit_count,0)- nvl(7d_ago.visit_count,0),
        nvl(old.visit_last_7d_day_count,0)+if(1d_ago.mid_id is null,0,1)- if(7d_ago.mid_id is null,0,1),
        nvl(old.visit_last_30d_count,0)+nvl(1d_ago.visit_count,0)- nvl(30d_ago.visit_count,0),
        nvl(old.visit_last_30d_day_count,0)+if(1d_ago.mid_id is null,0,1)- if(30d_ago.mid_id is null,0,1),
        nvl(old.visit_count,0)+nvl(1d_ago.visit_count,0),
        nvl(old.visit_day_count,0)+if(1d_ago.mid_id is null,0,1)
    from
    (
        select
            mid_id,
            brand,
            model,
            channel,
            os,
            area_code,
            version_code,
            visit_date_first,
            visit_date_last,
            visit_last_1d_count,
            visit_last_1d_day_count,
            visit_last_7d_count,
            visit_last_7d_day_count,
            visit_last_30d_count,
            visit_last_30d_day_count,
            visit_count,
            visit_day_count
        from ${APP}.dwt_visitor_topic
        where dt=date_add('$do_date',-1)
    )old
    full outer join
    (
        select
            mid_id,
            brand,
            model,
            is_new,
            channel,
            os,
            area_code,
            version_code,
            visit_count
        from ${APP}.dws_visitor_action_daycount
        where dt='$do_date'
    )1d_ago
    on old.mid_id=1d_ago.mid_id
    left join
    (
        select
            mid_id,
            brand,
            model,
            is_new,
            channel,
            os,
            area_code,
            version_code,
            visit_count
        from ${APP}.dws_visitor_action_daycount
        where dt=date_add('$do_date',-7)
    )7d_ago
    on old.mid_id=7d_ago.mid_id
    left join
    (
        select
            mid_id,
            brand,
            model,
            is_new,
            channel,
            os,
            area_code,
            version_code,
            visit_count
        from ${APP}.dws_visitor_action_daycount
        where dt=date_add('$do_date',-30)
    )30d_ago
    on old.mid_id=30d_ago.mid_id;
    alter table ${APP}.dwt_visitor_topic drop partition(dt='$clear_date');
    "
    
    dwt_user_topic="
    insert overwrite table ${APP}.dwt_user_topic partition(dt='$do_date')
    select
        nvl(1d_ago.user_id,old.user_id),
        nvl(old.login_date_first,'$do_date'),
        if(1d_ago.user_id is not null,'$do_date',old.login_date_last),
        nvl(1d_ago.login_count,0),
        if(1d_ago.user_id is not null,1,0),
        nvl(old.login_last_7d_count,0)+nvl(1d_ago.login_count,0)- nvl(7d_ago.login_count,0),
        nvl(old.login_last_7d_day_count,0)+if(1d_ago.user_id is null,0,1)- if(7d_ago.user_id is null,0,1),
        nvl(old.login_last_30d_count,0)+nvl(1d_ago.login_count,0)- nvl(30d_ago.login_count,0),
        nvl(old.login_last_30d_day_count,0)+if(1d_ago.user_id is null,0,1)- if(30d_ago.user_id is null,0,1),
        nvl(old.login_count,0)+nvl(1d_ago.login_count,0),
        nvl(old.login_day_count,0)+if(1d_ago.user_id is not null,1,0),
        if(old.order_date_first is null and 1d_ago.order_count>0, '$do_date', old.order_date_first),
        if(1d_ago.order_count>0,'$do_date',old.order_date_last),
        nvl(1d_ago.order_count,0),
        nvl(1d_ago.order_activity_count,0),
        nvl(1d_ago.order_activity_reduce_amount,0.0),
        nvl(1d_ago.order_coupon_count,0),
        nvl(1d_ago.order_coupon_reduce_amount,0.0),
        nvl(1d_ago.order_original_amount,0.0),
        nvl(1d_ago.order_final_amount,0.0),
        nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0),
        nvl(old.order_activity_last_7d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(7d_ago.order_activity_count,0),
        nvl(old.order_activity_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(7d_ago.order_activity_reduce_amount,0.0),
        nvl(old.order_coupon_last_7d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(7d_ago.order_coupon_count,0),
        nvl(old.order_coupon_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(7d_ago.order_coupon_reduce_amount,0.0),
        nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0),
        nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0),
        nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0),
        nvl(old.order_activity_last_30d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(30d_ago.order_activity_count,0),
        nvl(old.order_activity_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(30d_ago.order_activity_reduce_amount,0.0),
        nvl(old.order_coupon_last_30d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(30d_ago.order_coupon_count,0),
        nvl(old.order_coupon_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(30d_ago.order_coupon_reduce_amount,0.0),
        nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0),
        nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0),
        nvl(old.order_count,0)+nvl(1d_ago.order_count,0),
        nvl(old.order_activity_count,0)+nvl(1d_ago.order_activity_count,0),
        nvl(old.order_activity_reduce_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0),
        nvl(old.order_coupon_count,0)+nvl(1d_ago.order_coupon_count,0),
        nvl(old.order_coupon_reduce_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0),
        nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0),
        nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0),
        if(old.payment_date_first is null and 1d_ago.payment_count>0, '$do_date', old.payment_date_first),
        if(1d_ago.payment_count>0,'$do_date',old.payment_date_last),
        nvl(1d_ago.payment_count,0),
        nvl(1d_ago.payment_amount,0.0),
        nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)-nvl(7d_ago.payment_count,0),
        nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)-nvl(7d_ago.payment_amount,0.0),
        nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)-nvl(30d_ago.payment_count,0),
        nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0),
        nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0),
        nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0),
        nvl(1d_ago.refund_order_count,0),
        nvl(1d_ago.refund_order_num,0),
        nvl(1d_ago.refund_order_amount,0.0),
        nvl(old.refund_order_last_7d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(7d_ago.refund_order_count,0),
        nvl(old.refund_order_last_7d_num,0)+nvl(1d_ago.refund_order_num, 0)- nvl(7d_ago.refund_order_num,0),
        nvl(old.refund_order_last_7d_amount,0.0)+ nvl(1d_ago.refund_order_amount,0.0)- nvl(7d_ago.refund_order_amount,0.0),
        nvl(old.refund_order_last_30d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(30d_ago.refund_order_count,0),
        nvl(old.refund_order_last_30d_num,0)+nvl(1d_ago.refund_order_num, 0)- nvl(30d_ago.refund_order_num,0),
        nvl(old.refund_order_last_30d_amount,0.0)+ nvl(1d_ago.refund_order_amount,0.0)- nvl(30d_ago.refund_order_amount,0.0),
        nvl(old.refund_order_count,0)+nvl(1d_ago.refund_order_count,0),
        nvl(old.refund_order_num,0)+nvl(1d_ago.refund_order_num,0),
        nvl(old.refund_order_amount,0.0)+ nvl(1d_ago.refund_order_amount,0.0),
        nvl(1d_ago.refund_payment_count,0),
        nvl(1d_ago.refund_payment_num,0),
        nvl(1d_ago.refund_payment_amount,0.0),
        nvl(old.refund_payment_last_7d_count,0)+nvl(1d_ago.refund_payment_count,0)-nvl(7d_ago.refund_payment_count,0),
        nvl(old.refund_payment_last_7d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(7d_ago.refund_payment_num,0),
        nvl(old.refund_payment_last_7d_amount,0.0)+ nvl(1d_ago.refund_payment_amount,0.0)- nvl(7d_ago.refund_payment_amount,0.0),
        nvl(old.refund_payment_last_30d_count,0)+nvl(1d_ago.refund_payment_count,0)-nvl(30d_ago.refund_payment_count,0),
        nvl(old.refund_payment_last_30d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(30d_ago.refund_payment_num,0),
        nvl(old.refund_payment_last_30d_amount,0.0)+ nvl(1d_ago.refund_payment_amount,0.0)- nvl(30d_ago.refund_payment_amount,0.0),
        nvl(old.refund_payment_count,0)+nvl(1d_ago.refund_payment_count,0),
        nvl(old.refund_payment_num,0)+nvl(1d_ago.refund_payment_num,0),
        nvl(old.refund_payment_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0),
        nvl(1d_ago.cart_count,0),
        nvl(old.cart_last_7d_count,0)+nvl(1d_ago.cart_count,0)-nvl(7d_ago.cart_count,0),
        nvl(old.cart_last_30d_count,0)+nvl(1d_ago.cart_count,0)-nvl(30d_ago.cart_count,0),
        nvl(old.cart_count,0)+nvl(1d_ago.cart_count,0),
        nvl(1d_ago.favor_count,0),
        nvl(old.favor_last_7d_count,0)+nvl(1d_ago.favor_count,0)- nvl(7d_ago.favor_count,0),
        nvl(old.favor_last_30d_count,0)+nvl(1d_ago.favor_count,0)- nvl(30d_ago.favor_count,0),
        nvl(old.favor_count,0)+nvl(1d_ago.favor_count,0),
        nvl(1d_ago.coupon_get_count,0),
        nvl(1d_ago.coupon_using_count,0),
        nvl(1d_ago.coupon_used_count,0),
        nvl(old.coupon_last_7d_get_count,0)+nvl(1d_ago.coupon_get_count,0)- nvl(7d_ago.coupon_get_count,0),
        nvl(old.coupon_last_7d_using_count,0)+nvl(1d_ago.coupon_using_count,0)- nvl(7d_ago.coupon_using_count,0),
        nvl(old.coupon_last_7d_used_count,0)+ nvl(1d_ago.coupon_used_count,0)- nvl(7d_ago.coupon_used_count,0),
        nvl(old.coupon_last_30d_get_count,0)+nvl(1d_ago.coupon_get_count,0)- nvl(30d_ago.coupon_get_count,0),
        nvl(old.coupon_last_30d_using_count,0)+nvl(1d_ago.coupon_using_count,0)- nvl(30d_ago.coupon_using_count,0),
        nvl(old.coupon_last_30d_used_count,0)+ nvl(1d_ago.coupon_used_count,0)- nvl(30d_ago.coupon_used_count,0),
        nvl(old.coupon_get_count,0)+nvl(1d_ago.coupon_get_count,0),
        nvl(old.coupon_using_count,0)+nvl(1d_ago.coupon_using_count,0),
        nvl(old.coupon_used_count,0)+nvl(1d_ago.coupon_used_count,0),
        nvl(1d_ago.appraise_good_count,0),
        nvl(1d_ago.appraise_mid_count,0),
        nvl(1d_ago.appraise_bad_count,0),
        nvl(old.appraise_last_7d_default_count,0)+nvl(1d_ago.appraise_default_count,0)-nvl(7d_ago.appraise_default_count,0),
        nvl(old.appraise_last_7d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(7d_ago.appraise_good_count,0),
        nvl(old.appraise_last_7d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)-nvl(7d_ago.appraise_mid_count,0),
        nvl(old.appraise_last_7d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)-nvl(7d_ago.appraise_bad_count,0),
        nvl(old.appraise_last_7d_default_count,0)+nvl(1d_ago.appraise_default_count,0)-nvl(7d_ago.appraise_default_count,0),
        nvl(old.appraise_last_30d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(30d_ago.appraise_good_count,0),
        nvl(old.appraise_last_30d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)-nvl(30d_ago.appraise_mid_count,0),
        nvl(old.appraise_last_30d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)-nvl(30d_ago.appraise_bad_count,0),
        nvl(old.appraise_last_30d_default_count,0)+nvl(1d_ago.appraise_default_count,0)-nvl(30d_ago.appraise_default_count,0),
        nvl(old.appraise_good_count,0)+nvl(1d_ago.appraise_good_count,0),
        nvl(old.appraise_mid_count,0)+nvl(1d_ago.appraise_mid_count, 0),
        nvl(old.appraise_bad_count,0)+nvl(1d_ago.appraise_bad_count,0),
        nvl(old.appraise_default_count,0)+nvl(1d_ago.appraise_default_count,0)
    from
    (
        select
            user_id,
            login_date_first,
            login_date_last,
            login_date_1d_count,
            login_last_1d_day_count,
            login_last_7d_count,
            login_last_7d_day_count,
            login_last_30d_count,
            login_last_30d_day_count,
            login_count,
            login_day_count,
            order_date_first,
            order_date_last,
            order_last_1d_count,
            order_activity_last_1d_count,
            order_activity_reduce_last_1d_amount,
            order_coupon_last_1d_count,
            order_coupon_reduce_last_1d_amount,
            order_last_1d_original_amount,
            order_last_1d_final_amount,
            order_last_7d_count,
            order_activity_last_7d_count,
            order_activity_reduce_last_7d_amount,
            order_coupon_last_7d_count,
            order_coupon_reduce_last_7d_amount,
            order_last_7d_original_amount,
            order_last_7d_final_amount,
            order_last_30d_count,
            order_activity_last_30d_count,
            order_activity_reduce_last_30d_amount,
            order_coupon_last_30d_count,
            order_coupon_reduce_last_30d_amount,
            order_last_30d_original_amount,
            order_last_30d_final_amount,
            order_count,
            order_activity_count,
            order_activity_reduce_amount,
            order_coupon_count,
            order_coupon_reduce_amount,
            order_original_amount,
            order_final_amount,
            payment_date_first,
            payment_date_last,
            payment_last_1d_count,
            payment_last_1d_amount,
            payment_last_7d_count,
            payment_last_7d_amount,
            payment_last_30d_count,
            payment_last_30d_amount,
            payment_count,
            payment_amount,
            refund_order_last_1d_count,
            refund_order_last_1d_num,
            refund_order_last_1d_amount,
            refund_order_last_7d_count,
            refund_order_last_7d_num,
            refund_order_last_7d_amount,
            refund_order_last_30d_count,
            refund_order_last_30d_num,
            refund_order_last_30d_amount,
            refund_order_count,
            refund_order_num,
            refund_order_amount,
            refund_payment_last_1d_count,
            refund_payment_last_1d_num,
            refund_payment_last_1d_amount,
            refund_payment_last_7d_count,
            refund_payment_last_7d_num,
            refund_payment_last_7d_amount,
            refund_payment_last_30d_count,
            refund_payment_last_30d_num,
            refund_payment_last_30d_amount,
            refund_payment_count,
            refund_payment_num,
            refund_payment_amount,
            cart_last_1d_count,
            cart_last_7d_count,
            cart_last_30d_count,
            cart_count,
            favor_last_1d_count,
            favor_last_7d_count,
            favor_last_30d_count,
            favor_count,
            coupon_last_1d_get_count,
            coupon_last_1d_using_count,
            coupon_last_1d_used_count,
            coupon_last_7d_get_count,
            coupon_last_7d_using_count,
            coupon_last_7d_used_count,
            coupon_last_30d_get_count,
            coupon_last_30d_using_count,
            coupon_last_30d_used_count,
            coupon_get_count,
            coupon_using_count,
            coupon_used_count,
            appraise_last_1d_good_count,
            appraise_last_1d_mid_count,
            appraise_last_1d_bad_count,
            appraise_last_1d_default_count,
            appraise_last_7d_good_count,
            appraise_last_7d_mid_count,
            appraise_last_7d_bad_count,
            appraise_last_7d_default_count,
            appraise_last_30d_good_count,
            appraise_last_30d_mid_count,
            appraise_last_30d_bad_count,
            appraise_last_30d_default_count,
            appraise_good_count,
            appraise_mid_count,
            appraise_bad_count,
            appraise_default_count
        from ${APP}.dwt_user_topic
        where dt=date_add('$do_date',-1)
    )old
    full outer join
    (
        select
            user_id,
            login_count,
            cart_count,
            favor_count,
            order_count,
            order_activity_count,
            order_activity_reduce_amount,
            order_coupon_count,
            order_coupon_reduce_amount,
            order_original_amount,
            order_final_amount,
            payment_count,
            payment_amount,
            refund_order_count,
            refund_order_num,
            refund_order_amount,
            refund_payment_count,
            refund_payment_num,
            refund_payment_amount,
            coupon_get_count,
            coupon_using_count,
            coupon_used_count,
            appraise_good_count,
            appraise_mid_count,
            appraise_bad_count,
            appraise_default_count
        from ${APP}.dws_user_action_daycount
        where dt='$do_date'
    )1d_ago
    on old.user_id=1d_ago.user_id
    left join
    (
        select
            user_id,
            login_count,
            cart_count,
            favor_count,
            order_count,
            order_activity_count,
            order_activity_reduce_amount,
            order_coupon_count,
            order_coupon_reduce_amount,
            order_original_amount,
            order_final_amount,
            payment_count,
            payment_amount,
            refund_order_count,
            refund_order_num,
            refund_order_amount,
            refund_payment_count,
            refund_payment_num,
            refund_payment_amount,
            coupon_get_count,
            coupon_using_count,
            coupon_used_count,
            appraise_good_count,
            appraise_mid_count,
            appraise_bad_count,
            appraise_default_count
        from ${APP}.dws_user_action_daycount
        where dt=date_add('$do_date',-7)
    )7d_ago
    on old.user_id=7d_ago.user_id
    left join
    (
        select
            user_id,
            login_count,
            cart_count,
            favor_count,
            order_count,
            order_activity_count,
            order_activity_reduce_amount,
            order_coupon_count,
            order_coupon_reduce_amount,
            order_original_amount,
            order_final_amount,
            payment_count,
            payment_amount,
            refund_order_count,
            refund_order_num,
            refund_order_amount,
            refund_payment_count,
            refund_payment_num,
            refund_payment_amount,
            coupon_get_count,
            coupon_using_count,
            coupon_used_count,
            appraise_good_count,
            appraise_mid_count,
            appraise_bad_count,
            appraise_default_count
        from ${APP}.dws_user_action_daycount
        where dt=date_add('$do_date',-30)
    )30d_ago
    on old.user_id=30d_ago.user_id;
    alter table ${APP}.dwt_user_topic drop partition(dt='$clear_date');
    "
    
    dwt_sku_topic="
    insert overwrite table ${APP}.dwt_sku_topic partition(dt='$do_date')
    select
        nvl(1d_ago.sku_id,old.sku_id),
        nvl(1d_ago.order_count,0),
        nvl(1d_ago.order_num,0),
        nvl(1d_ago.order_activity_count,0),
        nvl(1d_ago.order_coupon_count,0),
        nvl(1d_ago.order_activity_reduce_amount,0.0),
        nvl(1d_ago.order_coupon_reduce_amount,0.0),
        nvl(1d_ago.order_original_amount,0.0),
        nvl(1d_ago.order_final_amount,0.0),
        nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0),
        nvl(old.order_last_7d_num,0)+nvl(1d_ago.order_num,0)- nvl(7d_ago.order_num,0),
        nvl(old.order_activity_last_7d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(7d_ago.order_activity_count,0),
        nvl(old.order_coupon_last_7d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(7d_ago.order_coupon_count,0),
        nvl(old.order_activity_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(7d_ago.order_activity_reduce_amount,0.0),
        nvl(old.order_coupon_reduce_last_7d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(7d_ago.order_coupon_reduce_amount,0.0),
        nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0),
        nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0),
        nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0),
        nvl(old.order_last_30d_num,0)+nvl(1d_ago.order_num,0)- nvl(30d_ago.order_num,0),
        nvl(old.order_activity_last_30d_count,0)+nvl(1d_ago.order_activity_count,0)- nvl(30d_ago.order_activity_count,0),
        nvl(old.order_coupon_last_30d_count,0)+nvl(1d_ago.order_coupon_count,0)- nvl(30d_ago.order_coupon_count,0),
        nvl(old.order_activity_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0)- nvl(30d_ago.order_activity_reduce_amount,0.0),
        nvl(old.order_coupon_reduce_last_30d_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0)- nvl(30d_ago.order_coupon_reduce_amount,0.0),
        nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0),
        nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0),
        nvl(old.order_count,0)+nvl(1d_ago.order_count,0),
        nvl(old.order_num,0)+nvl(1d_ago.order_num,0),
        nvl(old.order_activity_count,0)+nvl(1d_ago.order_activity_count,0),
        nvl(old.order_coupon_count,0)+nvl(1d_ago.order_coupon_count,0),
        nvl(old.order_activity_reduce_amount,0.0)+nvl(1d_ago.order_activity_reduce_amount,0.0),
        nvl(old.order_coupon_reduce_amount,0.0)+nvl(1d_ago.order_coupon_reduce_amount,0.0),
        nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0),
        nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0),
        nvl(1d_ago.payment_count,0),
        nvl(1d_ago.payment_num,0),
        nvl(1d_ago.payment_amount,0.0),
        nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)- nvl(7d_ago.payment_count,0),
        nvl(old.payment_last_7d_num,0)+nvl(1d_ago.payment_num,0)- nvl(7d_ago.payment_num,0),
        nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(7d_ago.payment_amount,0.0),
        nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)- nvl(30d_ago.payment_count,0),
        nvl(old.payment_last_30d_num,0)+nvl(1d_ago.payment_num,0)- nvl(30d_ago.payment_num,0),
        nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0),
        nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0),
        nvl(old.payment_num,0)+nvl(1d_ago.payment_num,0),
        nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0),
        nvl(old.refund_order_last_1d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(1d_ago.refund_order_count,0),
        nvl(old.refund_order_last_1d_num,0)+nvl(1d_ago.refund_order_num,0)- nvl(1d_ago.refund_order_num,0),
        nvl(old.refund_order_last_1d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(1d_ago.refund_order_amount,0.0),
        nvl(old.refund_order_last_7d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(7d_ago.refund_order_count,0),
        nvl(old.refund_order_last_7d_num,0)+nvl(1d_ago.refund_order_num,0)- nvl(7d_ago.refund_order_num,0),
        nvl(old.refund_order_last_7d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(7d_ago.refund_order_amount,0.0),
        nvl(old.refund_order_last_30d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(30d_ago.refund_order_count,0),
        nvl(old.refund_order_last_30d_num,0)+nvl(1d_ago.refund_order_num,0)- nvl(30d_ago.refund_order_num,0),
        nvl(old.refund_order_last_30d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(30d_ago.refund_order_amount,0.0),
        nvl(old.refund_order_count,0)+nvl(1d_ago.refund_order_count,0),
        nvl(old.refund_order_num,0)+nvl(1d_ago.refund_order_num,0),
        nvl(old.refund_order_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0),
        nvl(1d_ago.refund_payment_count,0),
        nvl(1d_ago.refund_payment_num,0),
        nvl(1d_ago.refund_payment_amount,0.0),
        nvl(old.refund_payment_last_7d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(7d_ago.refund_payment_count,0),
        nvl(old.refund_payment_last_7d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(7d_ago.refund_payment_num,0),
        nvl(old.refund_payment_last_7d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(7d_ago.refund_payment_amount,0.0),
        nvl(old.refund_payment_last_30d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(30d_ago.refund_payment_count,0),
        nvl(old.refund_payment_last_30d_num,0)+nvl(1d_ago.refund_payment_num,0)- nvl(30d_ago.refund_payment_num,0),
        nvl(old.refund_payment_last_30d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(30d_ago.refund_payment_amount,0.0),
        nvl(old.refund_payment_count,0)+nvl(1d_ago.refund_payment_count,0),
        nvl(old.refund_payment_num,0)+nvl(1d_ago.refund_payment_num,0),
        nvl(old.refund_payment_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0),
        nvl(1d_ago.cart_count,0),
        nvl(old.cart_last_7d_count,0)+nvl(1d_ago.cart_count,0)- nvl(7d_ago.cart_count,0),
        nvl(old.cart_last_30d_count,0)+nvl(1d_ago.cart_count,0)- nvl(30d_ago.cart_count,0),
        nvl(old.cart_count,0)+nvl(1d_ago.cart_count,0),
        nvl(1d_ago.favor_count,0),
        nvl(old.favor_last_7d_count,0)+nvl(1d_ago.favor_count,0)- nvl(7d_ago.favor_count,0),
        nvl(old.favor_last_30d_count,0)+nvl(1d_ago.favor_count,0)- nvl(30d_ago.favor_count,0),
        nvl(old.favor_count,0)+nvl(1d_ago.favor_count,0),
        nvl(1d_ago.appraise_good_count,0),
        nvl(1d_ago.appraise_mid_count,0),
        nvl(1d_ago.appraise_bad_count,0),
        nvl(1d_ago.appraise_default_count,0),
        nvl(old.appraise_last_7d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(7d_ago.appraise_good_count,0),
        nvl(old.appraise_last_7d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)- nvl(7d_ago.appraise_mid_count,0),
        nvl(old.appraise_last_7d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)- nvl(7d_ago.appraise_bad_count,0),
        nvl(old.appraise_last_7d_default_count,0)+nvl(1d_ago.appraise_default_count,0)- nvl(7d_ago.appraise_default_count,0),
        nvl(old.appraise_last_30d_good_count,0)+nvl(1d_ago.appraise_good_count,0)- nvl(30d_ago.appraise_good_count,0),
        nvl(old.appraise_last_30d_mid_count,0)+nvl(1d_ago.appraise_mid_count,0)- nvl(30d_ago.appraise_mid_count,0),
        nvl(old.appraise_last_30d_bad_count,0)+nvl(1d_ago.appraise_bad_count,0)- nvl(30d_ago.appraise_bad_count,0),
        nvl(old.appraise_last_30d_default_count,0)+nvl(1d_ago.appraise_default_count,0)- nvl(30d_ago.appraise_default_count,0),
        nvl(old.appraise_good_count,0)+nvl(1d_ago.appraise_good_count,0),
        nvl(old.appraise_mid_count,0)+nvl(1d_ago.appraise_mid_count,0),
        nvl(old.appraise_bad_count,0)+nvl(1d_ago.appraise_bad_count,0),
        nvl(old.appraise_default_count,0)+nvl(1d_ago.appraise_default_count,0)
    from
    (
        select
            sku_id,
            order_last_1d_count,
            order_last_1d_num,
            order_activity_last_1d_count,
            order_coupon_last_1d_count,
            order_activity_reduce_last_1d_amount,
            order_coupon_reduce_last_1d_amount,
            order_last_1d_original_amount,
            order_last_1d_final_amount,
            order_last_7d_count,
            order_last_7d_num,
            order_activity_last_7d_count,
            order_coupon_last_7d_count,
            order_activity_reduce_last_7d_amount,
            order_coupon_reduce_last_7d_amount,
            order_last_7d_original_amount,
            order_last_7d_final_amount,
            order_last_30d_count,
            order_last_30d_num,
            order_activity_last_30d_count,
            order_coupon_last_30d_count,
            order_activity_reduce_last_30d_amount,
            order_coupon_reduce_last_30d_amount,
            order_last_30d_original_amount,
            order_last_30d_final_amount,
            order_count,
            order_num,
            order_activity_count,
            order_coupon_count,
            order_activity_reduce_amount,
            order_coupon_reduce_amount,
            order_original_amount,
            order_final_amount,
            payment_last_1d_count,
            payment_last_1d_num,
            payment_last_1d_amount,
            payment_last_7d_count,
            payment_last_7d_num,
            payment_last_7d_amount,
            payment_last_30d_count,
            payment_last_30d_num,
            payment_last_30d_amount,
            payment_count,
            payment_num,
            payment_amount,
            refund_order_last_1d_count,
            refund_order_last_1d_num,
            refund_order_last_1d_amount,
            refund_order_last_7d_count,
            refund_order_last_7d_num,
            refund_order_last_7d_amount,
            refund_order_last_30d_count,
            refund_order_last_30d_num,
            refund_order_last_30d_amount,
            refund_order_count,
            refund_order_num,
            refund_order_amount,
            refund_payment_last_1d_count,
            refund_payment_last_1d_num,
            refund_payment_last_1d_amount,
            refund_payment_last_7d_count,
            refund_payment_last_7d_num,
            refund_payment_last_7d_amount,
            refund_payment_last_30d_count,
            refund_payment_last_30d_num,
            refund_payment_last_30d_amount,
            refund_payment_count,
            refund_payment_num,
            refund_payment_amount,
            cart_last_1d_count,
            cart_last_7d_count,
            cart_last_30d_count,
            cart_count,
            favor_last_1d_count,
            favor_last_7d_count,
            favor_last_30d_count,
            favor_count,
            appraise_last_1d_good_count,
            appraise_last_1d_mid_count,
            appraise_last_1d_bad_count,
            appraise_last_1d_default_count,
            appraise_last_7d_good_count,
            appraise_last_7d_mid_count,
            appraise_last_7d_bad_count,
            appraise_last_7d_default_count,
            appraise_last_30d_good_count,
            appraise_last_30d_mid_count,
            appraise_last_30d_bad_count,
            appraise_last_30d_default_count,
            appraise_good_count,
            appraise_mid_count,
            appraise_bad_count,
            appraise_default_count
        from ${APP}.dwt_sku_topic
        where dt=date_add('$do_date',-1)
    )old
    full outer join
    (
        select
            sku_id,
            order_count,
            order_num,
            order_activity_count,
            order_coupon_count,
            order_activity_reduce_amount,
            order_coupon_reduce_amount,
            order_original_amount,
            order_final_amount,
            payment_count,
            payment_num,
            payment_amount,
            refund_order_count,
            refund_order_num,
            refund_order_amount,
            refund_payment_count,
            refund_payment_num,
            refund_payment_amount,
            cart_count,
            favor_count,
            appraise_good_count,
            appraise_mid_count,
            appraise_bad_count,
            appraise_default_count
        from ${APP}.dws_sku_action_daycount
        where dt='$do_date'
    )1d_ago
    on old.sku_id=1d_ago.sku_id
    left join
    (
        select
            sku_id,
            order_count,
            order_num,
            order_activity_count,
            order_coupon_count,
            order_activity_reduce_amount,
            order_coupon_reduce_amount,
            order_original_amount,
            order_final_amount,
            payment_count,
            payment_num,
            payment_amount,
            refund_order_count,
            refund_order_num,
            refund_order_amount,
            refund_payment_count,
            refund_payment_num,
            refund_payment_amount,
            cart_count,
            favor_count,
            appraise_good_count,
            appraise_mid_count,
            appraise_bad_count,
            appraise_default_count
        from ${APP}.dws_sku_action_daycount
        where dt=date_add('$do_date',-7)
    )7d_ago
    on old.sku_id=7d_ago.sku_id
    left join
    (
        select
            sku_id,
            order_count,
            order_num,
            order_activity_count,
            order_coupon_count,
            order_activity_reduce_amount,
            order_coupon_reduce_amount,
            order_original_amount,
            order_final_amount,
            payment_count,
            payment_num,
            payment_amount,
            refund_order_count,
            refund_order_num,
            refund_order_amount,
            refund_payment_count,
            refund_payment_num,
            refund_payment_amount,
            cart_count,
            favor_count,
            appraise_good_count,
            appraise_mid_count,
            appraise_bad_count,
            appraise_default_count
        from ${APP}.dws_sku_action_daycount
        where dt=date_add('$do_date',-30)
    )30d_ago
    on old.sku_id=30d_ago.sku_id;
    alter table ${APP}.dwt_sku_topic drop partition(dt='$clear_date');
    "
    
    dwt_activity_topic="
    insert overwrite table ${APP}.dwt_activity_topic partition(dt='$do_date')
    select
        nvl(1d_ago.activity_rule_id,old.activity_rule_id),
        nvl(1d_ago.activity_id,old.activity_id),
        nvl(1d_ago.order_count,0),
        nvl(1d_ago.order_reduce_amount,0.0),
        nvl(1d_ago.order_original_amount,0.0),
        nvl(1d_ago.order_final_amount,0.0),
        nvl(old.order_count,0)+nvl(1d_ago.order_count,0),
        nvl(old.order_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0),
        nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0),
        nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0),
        nvl(1d_ago.payment_count,0),
        nvl(1d_ago.payment_reduce_amount,0.0),
        nvl(1d_ago.payment_amount,0.0),
        nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0),
        nvl(old.payment_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0),
        nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0)
    from
    (
        select
            activity_rule_id,
            activity_id,
            order_count,
            order_reduce_amount,
            order_original_amount,
            order_final_amount,
            payment_count,
            payment_reduce_amount,
            payment_amount
        from ${APP}.dwt_activity_topic
        where dt=date_add('$do_date',-1)
    )old
    full outer join
    (
        select
            activity_rule_id,
            activity_id,
            order_count,
            order_reduce_amount,
            order_original_amount,
            order_final_amount,
            payment_count,
            payment_reduce_amount,
            payment_amount
        from ${APP}.dws_activity_info_daycount
        where dt='$do_date'
    )1d_ago
    on old.activity_rule_id=1d_ago.activity_rule_id;
    alter table ${APP}.dwt_activity_topic drop partition(dt='$clear_date');
    "
    
    dwt_coupon_topic="
    insert overwrite table ${APP}.dwt_coupon_topic partition(dt='$do_date')
    select
        nvl(1d_ago.coupon_id,old.coupon_id),
        nvl(1d_ago.get_count,0),
        nvl(old.get_last_7d_count,0)+nvl(1d_ago.get_count,0)- nvl(7d_ago.get_count,0),
        nvl(old.get_last_30d_count,0)+nvl(1d_ago.get_count,0)- nvl(30d_ago.get_count,0),
        nvl(old.get_count,0)+nvl(1d_ago.get_count,0),
        nvl(1d_ago.order_count,0),
        nvl(1d_ago.order_reduce_amount,0.0),
        nvl(1d_ago.order_original_amount,0.0),
        nvl(1d_ago.order_final_amount,0.0),
        nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0),
        nvl(old.order_last_7d_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0)- nvl(7d_ago.order_reduce_amount,0.0),
        nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0),
        nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0),
        nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0),
        nvl(old.order_last_30d_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0)- nvl(30d_ago.order_reduce_amount,0.0),
        nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0),
        nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0),
        nvl(old.order_count,0)+nvl(1d_ago.order_count,0),
        nvl(old.order_reduce_amount,0.0)+nvl(1d_ago.order_reduce_amount,0.0),
        nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0),
        nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0),
        nvl(old.payment_last_1d_count,0)+nvl(1d_ago.payment_count,0)- nvl(1d_ago.payment_count,0),
        nvl(old.payment_last_1d_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0)- nvl(1d_ago.payment_reduce_amount,0.0),
        nvl(old.payment_last_1d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(1d_ago.payment_amount,0.0),
        nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)- nvl(7d_ago.payment_count,0),
        nvl(old.payment_last_7d_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0)- nvl(7d_ago.payment_reduce_amount,0.0),
        nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(7d_ago.payment_amount,0.0),
        nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)- nvl(30d_ago.payment_count,0),
        nvl(old.payment_last_30d_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0)- nvl(30d_ago.payment_reduce_amount,0.0),
        nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0),
        nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0),
        nvl(old.payment_reduce_amount,0.0)+nvl(1d_ago.payment_reduce_amount,0.0),
        nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0),
        nvl(1d_ago.expire_count,0),
        nvl(old.expire_last_7d_count,0)+nvl(1d_ago.expire_count,0)- nvl(7d_ago.expire_count,0),
        nvl(old.expire_last_30d_count,0)+nvl(1d_ago.expire_count,0)- nvl(30d_ago.expire_count,0),
        nvl(old.expire_count,0)+nvl(1d_ago.expire_count,0)
    from
    (
        select
            coupon_id,
            get_last_1d_count,
            get_last_7d_count,
            get_last_30d_count,
            get_count,
            order_last_1d_count,
            order_last_1d_reduce_amount,
            order_last_1d_original_amount,
            order_last_1d_final_amount,
            order_last_7d_count,
            order_last_7d_reduce_amount,
            order_last_7d_original_amount,
            order_last_7d_final_amount,
            order_last_30d_count,
            order_last_30d_reduce_amount,
            order_last_30d_original_amount,
            order_last_30d_final_amount,
            order_count,
            order_reduce_amount,
            order_original_amount,
            order_final_amount,
            payment_last_1d_count,
            payment_last_1d_reduce_amount,
            payment_last_1d_amount,
            payment_last_7d_count,
            payment_last_7d_reduce_amount,
            payment_last_7d_amount,
            payment_last_30d_count,
            payment_last_30d_reduce_amount,
            payment_last_30d_amount,
            payment_count,
            payment_reduce_amount,
            payment_amount,
            expire_last_1d_count,
            expire_last_7d_count,
            expire_last_30d_count,
            expire_count
        from ${APP}.dwt_coupon_topic
        where dt=date_add('$do_date',-1)
    )old
    full outer join
    (
        select
            coupon_id,
            get_count,
            order_count,
            order_reduce_amount,
            order_original_amount,
            order_final_amount,
            payment_count,
            payment_reduce_amount,
            payment_amount,
            expire_count
        from ${APP}.dws_coupon_info_daycount
        where dt='$do_date'
    )1d_ago
    on old.coupon_id=1d_ago.coupon_id
    left join
    (
        select
            coupon_id,
            get_count,
            order_count,
            order_reduce_amount,
            order_original_amount,
            order_final_amount,
            payment_count,
            payment_reduce_amount,
            payment_amount,
            expire_count
        from ${APP}.dws_coupon_info_daycount
        where dt=date_add('$do_date',-7)
    )7d_ago
    on old.coupon_id=7d_ago.coupon_id
    left join
    (
        select
            coupon_id,
            get_count,
            order_count,
            order_reduce_amount,
            order_original_amount,
            order_final_amount,
            payment_count,
            payment_reduce_amount,
            payment_amount,
            expire_count
        from ${APP}.dws_coupon_info_daycount
        where dt=date_add('$do_date',-30)
    )30d_ago
    on old.coupon_id=30d_ago.coupon_id;
    alter table ${APP}.dwt_coupon_topic drop partition(dt='$clear_date');
    "
    
    dwt_area_topic="
    insert overwrite table ${APP}.dwt_area_topic partition(dt='$do_date')
    select
        nvl(old.province_id, 1d_ago.province_id),
        nvl(1d_ago.visit_count,0),
        nvl(1d_ago.login_count,0),
        nvl(old.visit_last_7d_count,0)+nvl(1d_ago.visit_count,0)- nvl(7d_ago.visit_count,0),
        nvl(old.login_last_7d_count,0)+nvl(1d_ago.login_count,0)- nvl(7d_ago.login_count,0),
        nvl(old.visit_last_30d_count,0)+nvl(1d_ago.visit_count,0)- nvl(30d_ago.visit_count,0),
        nvl(old.login_last_30d_count,0)+nvl(1d_ago.login_count,0)- nvl(30d_ago.login_count,0),
        nvl(old.visit_count,0)+nvl(1d_ago.visit_count,0),
        nvl(old.login_count,0)+nvl(1d_ago.login_count,0),
        nvl(1d_ago.order_count,0),
        nvl(1d_ago.order_original_amount,0.0),
        nvl(1d_ago.order_final_amount,0.0),
        nvl(old.order_last_7d_count,0)+nvl(1d_ago.order_count,0)- nvl(7d_ago.order_count,0),
        nvl(old.order_last_7d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(7d_ago.order_original_amount,0.0),
        nvl(old.order_last_7d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(7d_ago.order_final_amount,0.0),
        nvl(old.order_last_30d_count,0)+nvl(1d_ago.order_count,0)- nvl(30d_ago.order_count,0),
        nvl(old.order_last_30d_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0)- nvl(30d_ago.order_original_amount,0.0),
        nvl(old.order_last_30d_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0)- nvl(30d_ago.order_final_amount,0.0),
        nvl(old.order_count,0)+nvl(1d_ago.order_count,0),
        nvl(old.order_original_amount,0.0)+nvl(1d_ago.order_original_amount,0.0),
        nvl(old.order_final_amount,0.0)+nvl(1d_ago.order_final_amount,0.0),
        nvl(1d_ago.payment_count,0),
        nvl(1d_ago.payment_amount,0.0),
        nvl(old.payment_last_7d_count,0)+nvl(1d_ago.payment_count,0)- nvl(7d_ago.payment_count,0),
        nvl(old.payment_last_7d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(7d_ago.payment_amount,0.0),
        nvl(old.payment_last_30d_count,0)+nvl(1d_ago.payment_count,0)- nvl(30d_ago.payment_count,0),
        nvl(old.payment_last_30d_amount,0.0)+nvl(1d_ago.payment_amount,0.0)- nvl(30d_ago.payment_amount,0.0),
        nvl(old.payment_count,0)+nvl(1d_ago.payment_count,0),
        nvl(old.payment_amount,0.0)+nvl(1d_ago.payment_amount,0.0),
        nvl(1d_ago.refund_order_count,0),
        nvl(1d_ago.refund_order_amount,0.0),
        nvl(old.refund_order_last_7d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(7d_ago.refund_order_count,0),
        nvl(old.refund_order_last_7d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(7d_ago.refund_order_amount,0.0),
        nvl(old.refund_order_last_30d_count,0)+nvl(1d_ago.refund_order_count,0)- nvl(30d_ago.refund_order_count,0),
        nvl(old.refund_order_last_30d_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0)- nvl(30d_ago.refund_order_amount,0.0),
        nvl(old.refund_order_count,0)+nvl(1d_ago.refund_order_count,0),
        nvl(old.refund_order_amount,0.0)+nvl(1d_ago.refund_order_amount,0.0),
        nvl(1d_ago.refund_payment_count,0),
        nvl(1d_ago.refund_payment_amount,0.0),
        nvl(old.refund_payment_last_7d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(7d_ago.refund_payment_count,0),
        nvl(old.refund_payment_last_7d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(7d_ago.refund_payment_amount,0.0),
        nvl(old.refund_payment_last_30d_count,0)+nvl(1d_ago.refund_payment_count,0)- nvl(30d_ago.refund_payment_count,0),
        nvl(old.refund_payment_last_30d_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)- nvl(30d_ago.refund_payment_amount,0.0),
        nvl(old.refund_payment_count,0)+nvl(1d_ago.refund_payment_count,0),
        nvl(old.refund_payment_amount,0.0)+nvl(1d_ago.refund_payment_amount,0.0)
    
    from
    (
        select
            province_id,
            visit_last_1d_count,
            login_last_1d_count,
            visit_last_7d_count,
            login_last_7d_count,
            visit_last_30d_count,
            login_last_30d_count,
            visit_count,
            login_count,
            order_last_1d_count,
            order_last_1d_original_amount,
            order_last_1d_final_amount,
            order_last_7d_count,
            order_last_7d_original_amount,
            order_last_7d_final_amount,
            order_last_30d_count,
            order_last_30d_original_amount,
            order_last_30d_final_amount,
            order_count,
            order_original_amount,
            order_final_amount,
            payment_last_1d_count,
            payment_last_1d_amount,
            payment_last_7d_count,
            payment_last_7d_amount,
            payment_last_30d_count,
            payment_last_30d_amount,
            payment_count,
            payment_amount,
            refund_order_last_1d_count,
            refund_order_last_1d_amount,
            refund_order_last_7d_count,
            refund_order_last_7d_amount,
            refund_order_last_30d_count,
            refund_order_last_30d_amount,
            refund_order_count,
            refund_order_amount,
            refund_payment_last_1d_count,
            refund_payment_last_1d_amount,
            refund_payment_last_7d_count,
            refund_payment_last_7d_amount,
            refund_payment_last_30d_count,
            refund_payment_last_30d_amount,
            refund_payment_count,
            refund_payment_amount
        from ${APP}.dwt_area_topic
        where dt=date_add('$do_date',-1)
    )old
    full outer join
    (
        select
            province_id,
            visit_count,
            login_count,
            order_count,
            order_original_amount,
            order_final_amount,
            payment_count,
            payment_amount,
            refund_order_count,
            refund_order_amount,
            refund_payment_count,
            refund_payment_amount
        from ${APP}.dws_area_stats_daycount
        where dt='$do_date'
    )1d_ago
    on old.province_id=1d_ago.province_id
    left join
    (
        select
            province_id,
            visit_count,
            login_count,
            order_count,
            order_original_amount,
            order_final_amount,
            payment_count,
            payment_amount,
            refund_order_count,
            refund_order_amount,
            refund_payment_count,
            refund_payment_amount
        from ${APP}.dws_area_stats_daycount
        where dt=date_add('$do_date',-7)
    )7d_ago
    on old.province_id= 7d_ago.province_id
    left join
    (
        select
            province_id,
            visit_count,
            login_count,
            order_count,
            order_original_amount,
            order_final_amount,
            payment_count,
            payment_amount,
            refund_order_count,
            refund_order_amount,
            refund_payment_count,
            refund_payment_amount
        from ${APP}.dws_area_stats_daycount
        where dt=date_add('$do_date',-30)
    )30d_ago
    on old.province_id= 30d_ago.province_id;
    alter table ${APP}.dwt_area_topic drop partition(dt='$clear_date');
    "
    
    
    case $1 in
        "dwt_visitor_topic" )
            hive -e "$dwt_visitor_topic"
            hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_visitor_topic/dt=$clear_date
        ;;
        "dwt_user_topic" )
            hive -e "$dwt_user_topic"
            hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_user_topic/dt=$clear_date
        ;;
        "dwt_sku_topic" )
            hive -e "$dwt_sku_topic"
            hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_sku_topic/dt=$clear_date
        ;;
        "dwt_activity_topic" )
            hive -e "$dwt_activity_topic"
            hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_activity_topic/dt=$clear_date
        ;;
        "dwt_coupon_topic" )
            hive -e "$dwt_coupon_topic"
            hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_coupon_topic/dt=$clear_date
        ;;
        "dwt_area_topic" )
            hive -e "$dwt_area_topic"
            hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_area_topic/dt=$clear_date
        ;;
        "all" )
            hive -e "$dwt_visitor_topic$dwt_user_topic$dwt_sku_topic$dwt_activity_topic$dwt_coupon_topic$dwt_area_topic"
            hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_visitor_topic/dt=$clear_date
            hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_user_topic/dt=$clear_date
            hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_sku_topic/dt=$clear_date
            hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_activity_topic/dt=$clear_date
            hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_coupon_topic/dt=$clear_date
            hadoop fs -rm -r -f /warehouse/gmall/dwt/dwt_area_topic/dt=$clear_date
        ;;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
    318
    319
    320
    321
    322
    323
    324
    325
    326
    327
    328
    329
    330
    331
    332
    333
    334
    335
    336
    337
    338
    339
    340
    341
    342
    343
    344
    345
    346
    347
    348
    349
    350
    351
    352
    353
    354
    355
    356
    357
    358
    359
    360
    361
    362
    363
    364
    365
    366
    367
    368
    369
    370
    371
    372
    373
    374
    375
    376
    377
    378
    379
    380
    381
    382
    383
    384
    385
    386
    387
    388
    389
    390
    391
    392
    393
    394
    395
    396
    397
    398
    399
    400
    401
    402
    403
    404
    405
    406
    407
    408
    409
    410
    411
    412
    413
    414
    415
    416
    417
    418
    419
    420
    421
    422
    423
    424
    425
    426
    427
    428
    429
    430
    431
    432
    433
    434
    435
    436
    437
    438
    439
    440
    441
    442
    443
    444
    445
    446
    447
    448
    449
    450
    451
    452
    453
    454
    455
    456
    457
    458
    459
    460
    461
    462
    463
    464
    465
    466
    467
    468
    469
    470
    471
    472
    473
    474
    475
    476
    477
    478
    479
    480
    481
    482
    483
    484
    485
    486
    487
    488
    489
    490
    491
    492
    493
    494
    495
    496
    497
    498
    499
    500
    501
    502
    503
    504
    505
    506
    507
    508
    509
    510
    511
    512
    513
    514
    515
    516
    517
    518
    519
    520
    521
    522
    523
    524
    525
    526
    527
    528
    529
    530
    531
    532
    533
    534
    535
    536
    537
    538
    539
    540
    541
    542
    543
    544
    545
    546
    547
    548
    549
    550
    551
    552
    553
    554
    555
    556
    557
    558
    559
    560
    561
    562
    563
    564
    565
    566
    567
    568
    569
    570
    571
    572
    573
    574
    575
    576
    577
    578
    579
    580
    581
    582
    583
    584
    585
    586
    587
    588
    589
    590
    591
    592
    593
    594
    595
    596
    597
    598
    599
    600
    601
    602
    603
    604
    605
    606
    607
    608
    609
    610
    611
    612
    613
    614
    615
    616
    617
    618
    619
    620
    621
    622
    623
    624
    625
    626
    627
    628
    629
    630
    631
    632
    633
    634
    635
    636
    637
    638
    639
    640
    641
    642
    643
    644
    645
    646
    647
    648
    649
    650
    651
    652
    653
    654
    655
    656
    657
    658
    659
    660
    661
    662
    663
    664
    665
    666
    667
    668
    669
    670
    671
    672
    673
    674
    675
    676
    677
    678
    679
    680
    681
    682
    683
    684
    685
    686
    687
    688
    689
    690
    691
    692
    693
    694
    695
    696
    697
    698
    699
    700
    701
    702
    703
    704
    705
    706
    707
    708
    709
    710
    711
    712
    713
    714
    715
    716
    717
    718
    719
    720
    721
    722
    723
    724
    725
    726
    727
    728
    729
    730
    731
    732
    733
    734
    735
    736
    737
    738
    739
    740
    741
    742
    743
    744
    745
    746
    747
    748
    749
    750
    751
    752
    753
    754
    755
    756
    757
    758
    759
    760
    761
    762
    763
    764
    765
    766
    767
    768
    769
    770
    771
    772
    773
    774
    775
    776
    777
    778
    779
    780
    781
    782
    783
    784
    785
    786
    787
    788
    789
    790
    791
    792
    793
    794
    795
    796
    797
    798
    799
    800
    801
    802
    803
    804
    805
    806
    807
    808
    809
    810
    811
    812
    813
    814
    815
    816
    817
    818
    819
    820
    821
    822
    823
    824
    825
    826
    827
    828
    829
    830
    831
    832
    833
    834
    835
    836
    837
    838
    839
    840
    841
    842
    843
    844
    845
    846
    847
    848
    849
    850
    851
    852
    853
    854
    855
    856
    857
    858
    859
    860
    861
    862
    863
    864
    865
    866
    867
    868
    869
    870
    871
    872
    873
    874
    875
    876
    877
    878
    879
    880
    881
    882
    883
    884
    885
    886
    887
    888
    889
    890
    891
    892
    893
    894
    895
    896
    897
    898
    899
    900
    901
    902
    903
    904
    905
    906
    907
    908
    909
    910
    911
    912
    913
    914
    915
    916
    917
    918
    919
    920
    921
    922
    923
    924
    925
    926
    927
    928
    929
    930
    931
    932
    933
    934
    935
    936
    937
    938
    939
    940
    941
    942
    943
    944
    945
    946
    947
    948
    949
    950
    951
    952
    953
    954
    955
    956
    957
    958
    959
    960
    961
    962
    963
    964
    965
    966
    967
    968
    969
    970
    971
    972
    973
    974
    975
    976
    977
    978
    979
    980
    981
    982
    983
    984
    985
    986
    987
    988
    989
    990
    991
    992
    993
    994
    995
    996
    997
    998
    999
    1000
    1001
    1002
    1003
    1004
    1005
    1006
    1007
    1008
    1009
    1010
    1011
    1012
    1013
    1014
    1015
    1016
    1017
    1018
    1019
    1020
    1021
    1022
    1023
    1024
    1025
    1026
    1027
    1028
    1029
    1030
    1031
    1032
    1033
    1034
    1035
    1036
    1037
    1038
    1039
    1040
    1041
    1042
    1043
    1044
    1045
    1046
    1047
    1048
    1049
    1050
    1051
    1052
    1053
    1054
    1055
    1056
    1057
    1058
    1059
    1060
    1061
    1062
    1063
    1064
    1065
    1066
    1067
    1068
    1069
    1070
    1071
    1072
    1073
    1074
    1075
    1076
    1077
    1078
    1079
    1080
    1081
    1082
    1083
    1084
    1085
    1086
    1087
    1088
    1089
    1090
    1091
    1092
    1093
    1094
    1095
    1096
    1097
    1098
    1099
    1100
    1101
    1102
    1103
    1104
    1105
    1106
    1107
    1108
    1109
    1110
    1111
    1112
    1113
    1114
    1115
    1116
    1117
    1118
  2. 增加脚本执行权限

  3. 执行脚本

    dws_to_dwt.sh 2020-06-14
    
    1

# 数仓搭建-ADS层

ADS层不涉及建模,建表根据具体需求而定

# 访客主题

# 访客统计

该需求为访客综合统计,其中包含若干指标,以下为对每个指标的解释说明。

指标 说明 对应字段
访客数 统计访问人数 uv_count
页面停留时长 统计所有页面访问记录总时长,以秒为单位 duration_sec
平均页面停留时长 统计每个会话平均停留时长,以秒为单位 avg_duration_sec
页面浏览总数 统计所有页面访问记录总数 page_count
平均页面浏览数 统计每个会话平均浏览页面数 avg_page_count
会话总数 统计会话总数 sv_count
跳出数 统计只浏览一个页面的会话个数 bounce_count
跳出率 只有一个页面的会话的比例 bounce_rate
  1. 建表语句

    DROP TABLE IF EXISTS ads_visit_stats;
    CREATE EXTERNAL TABLE ads_visit_stats (
      `dt` STRING COMMENT '统计日期',
      `is_new` STRING COMMENT '新老标识,1:新,0:老',
      `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
      `channel` STRING COMMENT '渠道',
      `uv_count` BIGINT COMMENT '日活(访问人数)',
      `duration_sec` BIGINT COMMENT '页面停留总时长',
      `avg_duration_sec` BIGINT COMMENT '一次会话,页面停留平均时长,单位为描述',
      `page_count` BIGINT COMMENT '页面总浏览数',
      `avg_page_count` BIGINT COMMENT '一次会话,页面平均浏览数',
      `sv_count` BIGINT COMMENT '会话次数',
      `bounce_count` BIGINT COMMENT '跳出数',
      `bounce_rate` DECIMAL(16,2) COMMENT '跳出率'
    ) COMMENT '访客统计'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_visit_stats/';
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
  2. 数据装载

    1. 对所有页面访问记录进行会话的划分

    2. 统计每个会话的浏览时长和浏览页面数

    3. 统计上述各指标

      insert overwrite table ads_visit_stats
      select * from ads_visit_stats
      union
      select
          '2020-06-14' dt,
          is_new,
          recent_days,
          channel,
          count(distinct(mid_id)) uv_count,
          cast(sum(duration)/1000 as bigint) duration_sec,
          cast(avg(duration)/1000 as bigint) avg_duration_sec,
          sum(page_count) page_count,
          cast(avg(page_count) as bigint) avg_page_count,
          count(*) sv_count,
          sum(if(page_count=1,1,0)) bounce_count,
          cast(sum(if(page_count=1,1,0))/count(*)*100 as decimal(16,2)) bounce_rate
      from
      (
          select
              session_id,
              mid_id,
              is_new,
              recent_days,
              channel,
              count(*) page_count,
              sum(during_time) duration
          from
          (
              select
                  mid_id,
                  channel,
                  recent_days,
                  is_new,
                  last_page_id,
                  page_id,
                  during_time,
                  concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by recent_days,mid_id order by ts)) session_id
              from
              (
                  select
                      mid_id,
                      channel,
                      last_page_id,
                      page_id,
                      during_time,
                      ts,
                      recent_days,
                      if(visit_date_first>=date_add('2020-06-14',-recent_days+1),'1','0') is_new
                  from
                  (
                      select
                          t1.mid_id,
                          t1.channel,
                          t1.last_page_id,
                          t1.page_id,
                          t1.during_time,
                          t1.dt,
                          t1.ts,
                          t2.visit_date_first
                      from
                      (
                          select
                              mid_id,
                              channel,
                              last_page_id,
                              page_id,
                              during_time,
                              dt,
                              ts
                          from dwd_page_log
                          where dt>=date_add('2020-06-14',-30)
                      )t1
                      left join
                      (
                          select
                              mid_id,
                              visit_date_first
                          from dwt_visitor_topic
                          where dt='2020-06-14'
                      )t2
                      on t1.mid_id=t2.mid_id
                  )t3 lateral view explode(Array(1,7,30)) tmp as recent_days
                  where dt>=date_add('2020-06-14',-recent_days+1)
              )t4
          )t5
          group by session_id,mid_id,is_new,recent_days,channel
      )t6
      group by is_new,recent_days,channel;
      
      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      79
      80
      81
      82
      83
      84
      85
      86
      87
      88

# 路径分析

  1. 建表语句

    DROP TABLE IF EXISTS ads_page_path;
    CREATE EXTERNAL TABLE ads_page_path
    (
        `dt` STRING COMMENT '统计日期',
        `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
        `source` STRING COMMENT '跳转起始页面ID',
        `target` STRING COMMENT '跳转终到页面ID',
        `path_count` BIGINT COMMENT '跳转次数'
    )  COMMENT '页面浏览路径'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_page_path/';
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
  2. 数据装载

    insert overwrite table ads_page_path
    select * from ads_page_path
    union
    select
        '2020-06-14',
        recent_days,
        source,
        target,
        count(*)
    from
    (
        select
            recent_days,
            concat('step-',step,':',source) source,
            concat('step-',step+1,':',target) target
        from
        (
            select
                recent_days,
                page_id source,
                lead(page_id,1,null) over (partition by recent_days,session_id order by ts) target,
                row_number() over (partition by recent_days,session_id order by ts) step
            from
            (
                select
                    recent_days,
                    last_page_id,
                    page_id,
                    ts,
                    concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by mid_id,recent_days order by ts)) session_id
                from dwd_page_log lateral view explode(Array(1,7,30)) tmp as recent_days
                where dt>=date_add('2020-06-14',-30)
                and dt>=date_add('2020-06-14',-recent_days+1)
            )t2
        )t3
    )t4
    group by recent_days,source,target;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37

# 用户主题

# 用户统计

该需求为用户综合统计,其中包含若干指标,以下为对每个指标的解释说明

指标 说明 对应字段
新增用户数 统计新增注册用户人数 new_user_count
新增下单用户数 统计新增下单用户人数 new_order_user_count
下单总金额 统计所有订单总额 order_final_amount
下单用户数 统计下单用户总数 order_user_count
未下单用户数 统计活跃但未下单用户数 no_order_user_count
  1. 建表语句

    DROP TABLE IF EXISTS ads_user_total;
    CREATE EXTERNAL TABLE `ads_user_total` (
      `dt` STRING COMMENT '统计日期',
      `recent_days` BIGINT COMMENT '最近天数,0:累积值,1:最近1天,7:最近7天,30:最近30天',
      `new_user_count` BIGINT COMMENT '新注册用户数',
      `new_order_user_count` BIGINT COMMENT '新增下单用户数',
      `order_final_amount` DECIMAL(16,2) COMMENT '下单总金额',
      `order_user_count` BIGINT COMMENT '下单用户数',
      `no_order_user_count` BIGINT COMMENT '未下单用户数(具体指活跃用户中未下单用户)'
    ) COMMENT '用户统计'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_user_total/';
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
  2. 数据导入

    insert overwrite table ads_user_total
    select * from ads_user_total
    union
    select
        '2020-06-14',
        recent_days,
        sum(if(login_date_first>=recent_days_ago,1,0)) new_user_count,
        sum(if(order_date_first>=recent_days_ago,1,0)) new_order_user_count,
        sum(order_final_amount) order_final_amount,
        sum(if(order_final_amount>0,1,0)) order_user_count,
        sum(if(login_date_last>=recent_days_ago and order_final_amount=0,1,0)) no_order_user_count
    from
    (
        select
            recent_days,
            user_id,
            login_date_first,
            login_date_last,
            order_date_first,
            case when recent_days=0 then order_final_amount
                 when recent_days=1 then order_last_1d_final_amount
                 when recent_days=7 then order_last_7d_final_amount
                 when recent_days=30 then order_last_30d_final_amount
            end order_final_amount,
            if(recent_days=0,'1970-01-01',date_add('2020-06-14',-recent_days+1)) recent_days_ago
        from dwt_user_topic lateral view explode(Array(0,1,7,30)) tmp as recent_days
        where dt='2020-06-14'
    )t1
    group by recent_days;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29

# 用户变动统计

该需求包括两个指标,分别为流失用户数和回流用户数,以下为对两个指标的解释说明

指标 说明 对应字段
流失用户数 之前活跃过的用户,最近一段时间未活跃,就称为流失用户。此处要求统计7日前(只包含7日前当天)活跃,但最近7日未活跃的用户总数。 user_churn_count
回流用户数 之前的活跃用户,一段时间未活跃(流失),今日又活跃了,就称为回流用户。此处要求统计回流用户总数。 new_order_user_count
  1. 建表语句

    DROP TABLE IF EXISTS ads_user_change;
    CREATE EXTERNAL TABLE `ads_user_change` (
      `dt` STRING COMMENT '统计日期',
      `user_churn_count` BIGINT COMMENT '流失用户数',
      `user_back_count` BIGINT COMMENT '回流用户数'
    ) COMMENT '用户变动统计'
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_user_change/';
    
    1
    2
    3
    4
    5
    6
    7
    8
  2. 数据装载

    insert overwrite table ads_user_change
    select * from ads_user_change
    union
    select
        churn.dt,
        user_churn_count,
        user_back_count
    from
    (
        select
            '2020-06-14' dt,
            count(*) user_churn_count
        from dwt_user_topic
        where dt='2020-06-14'
        and login_date_last=date_add('2020-06-14',-7)
    )churn
    join
    (
        select
            '2020-06-14' dt,
            count(*) user_back_count
        from
        (
            select
                user_id,
                login_date_last
            from dwt_user_topic
            where dt='2020-06-14'
            and login_date_last='2020-06-14'
        )t1
        join
        (
            select
                user_id,
                login_date_last login_date_previous
            from dwt_user_topic
            where dt=date_add('2020-06-14',-1)
        )t2
        on t1.user_id=t2.user_id
        where datediff(login_date_last,login_date_previous)>=8
    )back
    on churn.dt=back.dt;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42

# 用户行为漏斗分析

  1. 建表语句

    DROP TABLE IF EXISTS ads_user_action;
    CREATE EXTERNAL TABLE `ads_user_action` (
      `dt` STRING COMMENT '统计日期',
      `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
      `home_count` BIGINT COMMENT '浏览首页人数',
      `good_detail_count` BIGINT COMMENT '浏览商品详情页人数',
      `cart_count` BIGINT COMMENT '加入购物车人数',
      `order_count` BIGINT COMMENT '下单人数',
      `payment_count` BIGINT COMMENT '支付人数'
    ) COMMENT '漏斗分析'
    ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_user_action/';
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
  2. 数据装载

    with
    tmp_page as
    (
        select
            '2020-06-14' dt,
            recent_days,
            sum(if(array_contains(pages,'home'),1,0)) home_count,
            sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_count
        from
        (
            select
                recent_days,
                mid_id,
                collect_set(page_id) pages
            from
            (
                select
                    dt,
                    mid_id,
                    page.page_id
                from dws_visitor_action_daycount lateral view explode(page_stats) tmp as page
                where dt>=date_add('2020-06-14',-29)
                and page.page_id in('home','good_detail')
            )t1 lateral view explode(Array(1,7,30)) tmp as recent_days
            where dt>=date_add('2020-06-14',-recent_days+1)
            group by recent_days,mid_id
        )t2
        group by recent_days
    ),
    tmp_cop as
    (
        select
            '2020-06-14' dt,
            recent_days,
            sum(if(cart_count>0,1,0)) cart_count,
            sum(if(order_count>0,1,0)) order_count,
            sum(if(payment_count>0,1,0)) payment_count
        from
        (
            select
                recent_days,
                user_id,
                case
                    when recent_days=1 then cart_last_1d_count
                    when recent_days=7 then cart_last_7d_count
                    when recent_days=30 then cart_last_30d_count
                end cart_count,
                case
                    when recent_days=1 then order_last_1d_count
                    when recent_days=7 then order_last_7d_count
                    when recent_days=30 then order_last_30d_count
                end order_count,
                case
                    when recent_days=1 then payment_last_1d_count
                    when recent_days=7 then payment_last_7d_count
                    when recent_days=30 then payment_last_30d_count
                end payment_count
            from dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days
            where dt='2020-06-14'
        )t1
        group by recent_days
    )
    insert overwrite table ads_user_action
    select * from ads_user_action
    union
    select
        tmp_page.dt,
        tmp_page.recent_days,
        home_count,
        good_detail_count,
        cart_count,
        order_count,
        payment_count
    from tmp_page
    join tmp_cop
    on tmp_page.recent_days=tmp_cop.recent_days;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76

# 用户留存率

  1. 建表语句

    DROP TABLE IF EXISTS ads_user_retention;
    CREATE EXTERNAL TABLE ads_user_retention (
      `dt` STRING COMMENT '统计日期',
      `create_date` STRING COMMENT '用户新增日期',
      `retention_day` BIGINT COMMENT '截至当前日期留存天数',
      `retention_count` BIGINT COMMENT '留存用户数量',
      `new_user_count` BIGINT COMMENT '新增用户数量',
      `retention_rate` DECIMAL(16,2) COMMENT '留存率'
    ) COMMENT '用户留存率'
    ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_user_retention/';
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
  2. 数据装载

    insert overwrite table ads_user_retention
    select * from ads_user_retention
    union
    select
        '2020-06-14',
        login_date_first create_date,
        datediff('2020-06-14',login_date_first) retention_day,
        sum(if(login_date_last='2020-06-14',1,0)) retention_count,
        count(*) new_user_count,
        cast(sum(if(login_date_last='2020-06-14',1,0))/count(*)*100 as decimal(16,2)) retention_rate
    from dwt_user_topic
    where dt='2020-06-14'
    and login_date_first>=date_add('2020-06-14',-7)
    and login_date_first<'2020-06-14'
    group by login_date_first;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15

# 商品主题

# 商品统计

  1. 建表语句

    DROP TABLE IF EXISTS ads_order_spu_stats;
    CREATE EXTERNAL TABLE `ads_order_spu_stats` (
        `dt` STRING COMMENT '统计日期',
        `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
        `spu_id` STRING COMMENT '商品ID',
        `spu_name` STRING COMMENT '商品名称',
        `tm_id` STRING COMMENT '品牌ID',
        `tm_name` STRING COMMENT '品牌名称',
        `category3_id` STRING COMMENT '三级品类ID',
        `category3_name` STRING COMMENT '三级品类名称',
        `category2_id` STRING COMMENT '二级品类ID',
        `category2_name` STRING COMMENT '二级品类名称',
        `category1_id` STRING COMMENT '一级品类ID',
        `category1_name` STRING COMMENT '一级品类名称',
        `order_count` BIGINT COMMENT '订单数',
        `order_amount` DECIMAL(16,2) COMMENT '订单金额'
    ) COMMENT '商品销售统计'
    ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_order_spu_stats/';
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
  2. 数据装载

    insert overwrite table ads_order_spu_stats
    select * from ads_order_spu_stats
    union
    select
        '2020-06-14' dt,
        recent_days,
        spu_id,
        spu_name,
        tm_id,
        tm_name,
        category3_id,
        category3_name,
        category2_id,
        category2_name,
        category1_id,
        category1_name,
        sum(order_count),
        sum(order_amount)
    from
    (
        select
            recent_days,
            sku_id,
            case
                when recent_days=1 then order_last_1d_count
                when recent_days=7 then order_last_7d_count
                when recent_days=30 then order_last_30d_count
            end order_count,
            case
                when recent_days=1 then order_last_1d_final_amount
                when recent_days=7 then order_last_7d_final_amount
                when recent_days=30 then order_last_30d_final_amount
            end order_amount
        from dwt_sku_topic lateral view explode(Array(1,7,30)) tmp as recent_days
        where dt='2020-06-14'
    )t1
    left join
    (
        select
            id,
            spu_id,
            spu_name,
            tm_id,
            tm_name,
            category3_id,
            category3_name,
            category2_id,
            category2_name,
            category1_id,
            category1_name
        from dim_sku_info
        where dt='2020-06-14'
    )t2
    on t1.sku_id=t2.id
    group by recent_days,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_name;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55

# 品牌复购率

品牌复购率是指一段时间内重复购买某品牌的人数与购买过该品牌的人数的比值。重复购买即购买次数大于等于2,购买过即购买次数大于1。

此处要求统计最近1,7,30天的各品牌复购率。

  1. 建表语句

    DROP TABLE IF EXISTS ads_repeat_purchase;
    CREATE EXTERNAL TABLE `ads_repeat_purchase` (
      `dt` STRING COMMENT '统计日期',
      `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
      `tm_id` STRING COMMENT '品牌ID',
      `tm_name` STRING COMMENT '品牌名称',
      `order_repeat_rate` DECIMAL(16,2) COMMENT '复购率'
    ) COMMENT '品牌复购率'
    ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_repeat_purchase/';
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
  2. 数据装载

    insert overwrite table ads_repeat_purchase
    select * from ads_repeat_purchase
    union
    select
        '2020-06-14' dt,
        recent_days,
        tm_id,
        tm_name,
        cast(sum(if(order_count>=2,1,0))/sum(if(order_count>=1,1,0))*100 as decimal(16,2))
    from
    (
        select
            recent_days,
            user_id,
            tm_id,
            tm_name,
            sum(order_count) order_count
        from
        (
            select
                recent_days,
                user_id,
                sku_id,
                count(*) order_count
            from dwd_order_detail lateral view explode(Array(1,7,30)) tmp as recent_days
            where dt>=date_add('2020-06-14',-29)
            and dt>=date_add('2020-06-14',-recent_days+1)
            group by recent_days, user_id,sku_id
        )t1
        left join
        (
            select
                id,
                tm_id,
                tm_name
            from dim_sku_info
            where dt='2020-06-14'
        )t2
        on t1.sku_id=t2.id
        group by recent_days,user_id,tm_id,tm_name
    )t3
    group by recent_days,tm_id,tm_name;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42

# 订单主题

# 订单统计

  1. 建表语句

    DROP TABLE IF EXISTS ads_order_total;
    CREATE EXTERNAL TABLE `ads_order_total` (
      `dt` STRING COMMENT '统计日期',
      `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
      `order_count` BIGINT COMMENT '订单数',
      `order_amount` DECIMAL(16,2) COMMENT '订单金额',
      `order_user_count` BIGINT COMMENT '下单人数'
    ) COMMENT '订单统计'
    ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_order_total/';
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
  2. 数据装载

    insert overwrite table ads_order_total
    select * from ads_order_total
    union
    select
        '2020-06-14',
        recent_days,
        sum(order_count),
        sum(order_final_amount) order_final_amount,
        sum(if(order_final_amount>0,1,0)) order_user_count
    from
    (
        select
            recent_days,
            user_id,
            case when recent_days=0 then order_count
                 when recent_days=1 then order_last_1d_count
                 when recent_days=7 then order_last_7d_count
                 when recent_days=30 then order_last_30d_count
            end order_count,
            case when recent_days=0 then order_final_amount
                 when recent_days=1 then order_last_1d_final_amount
                 when recent_days=7 then order_last_7d_final_amount
                 when recent_days=30 then order_last_30d_final_amount
            end order_final_amount
        from dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days
        where dt='2020-06-14'
    )t1
    group by recent_days;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28

# 各地区订单统计

  1. 建表语句

    DROP TABLE IF EXISTS ads_order_by_province;
    CREATE EXTERNAL TABLE `ads_order_by_province` (
      `dt` STRING COMMENT '统计日期',
      `recent_days` BIGINT COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
      `province_id` STRING COMMENT '省份ID',
      `province_name` STRING COMMENT '省份名称',
      `area_code` STRING COMMENT '地区编码',
      `iso_code` STRING COMMENT '国际标准地区编码',
      `iso_code_3166_2` STRING COMMENT '国际标准地区编码',
      `order_count` BIGINT COMMENT '订单数',
      `order_amount` DECIMAL(16,2) COMMENT '订单金额'
    ) COMMENT '各地区订单统计'
    ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_order_by_province/';
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
  2. 数据装载

    insert overwrite table ads_order_by_province
    select * from ads_order_by_province
    union
    select
        dt,
        recent_days,
        province_id,
        province_name,
        area_code,
        iso_code,
        iso_3166_2,
        order_count,
        order_amount
    from
    (
        select
            '2020-06-14' dt,
            recent_days,
            province_id,
            sum(order_count) order_count,
            sum(order_amount) order_amount
        from
        (
            select
                recent_days,
                province_id,
                case
                    when recent_days=1 then order_last_1d_count
                    when recent_days=7 then order_last_7d_count
                    when recent_days=30 then order_last_30d_count
                end order_count,
                case
                    when recent_days=1 then order_last_1d_final_amount
                    when recent_days=7 then order_last_7d_final_amount
                    when recent_days=30 then order_last_30d_final_amount
                end order_amount
            from dwt_area_topic lateral view explode(Array(1,7,30)) tmp as recent_days
            where dt='2020-06-14'
        )t1
        group by recent_days,province_id
    )t2
    join dim_base_province t3
    on t2.province_id=t3.id;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43

# 优惠券主题

# 优惠券统计

  1. 建表语句

    DROP TABLE IF EXISTS ads_coupon_stats;
    CREATE EXTERNAL TABLE ads_coupon_stats (
      `dt` STRING COMMENT '统计日期',
      `coupon_id` STRING COMMENT '优惠券ID',
      `coupon_name` STRING COMMENT '优惠券名称',
      `start_date` STRING COMMENT '发布日期',
      `rule_name` STRING COMMENT '优惠规则,例如满100元减10元',
      `get_count`  BIGINT COMMENT '领取次数',
      `order_count` BIGINT COMMENT '使用(下单)次数',
      `expire_count`  BIGINT COMMENT '过期次数',
      `order_original_amount` DECIMAL(16,2) COMMENT '使用优惠券订单原始金额',
      `order_final_amount` DECIMAL(16,2) COMMENT '使用优惠券订单最终金额',
      `reduce_amount` DECIMAL(16,2) COMMENT '优惠金额',
      `reduce_rate` DECIMAL(16,2) COMMENT '补贴率'
    ) COMMENT '商品销售统计'
    ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_coupon_stats/';
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
  2. 数据装载

    insert overwrite table ads_coupon_stats
    select * from ads_coupon_stats
    union
    select
        '2020-06-14' dt,
        t1.id,
        coupon_name,
        start_date,
        rule_name,
        get_count,
        order_count,
        expire_count,
        order_original_amount,
        order_final_amount,
        reduce_amount,
        reduce_rate
    from
    (
        select
            id,
            coupon_name,
            date_format(start_time,'yyyy-MM-dd') start_date,
            case
                when coupon_type='3201' then concat('满',condition_amount,'元减',benefit_amount,'元')
                when coupon_type='3202' then concat('满',condition_num,'件打', (1-benefit_discount)*10,'折')
                when coupon_type='3203' then concat('减',benefit_amount,'元')
            end rule_name
        from dim_coupon_info
        where dt='2020-06-14'
        and date_format(start_time,'yyyy-MM-dd')>=date_add('2020-06-14',-29)
    )t1
    left join
    (
        select
            coupon_id,
            get_count,
            order_count,
            expire_count,
            order_original_amount,
            order_final_amount,
            order_reduce_amount reduce_amount,
            cast(order_reduce_amount/order_original_amount as decimal(16,2)) reduce_rate
        from dwt_coupon_topic
        where dt='2020-06-14'
    )t2
    on t1.id=t2.coupon_id;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46

# 活动主题

# 活动统计

  1. 建表语句

    DROP TABLE IF EXISTS ads_activity_stats;
    CREATE EXTERNAL TABLE `ads_activity_stats` (
      `dt` STRING COMMENT '统计日期',
      `activity_id` STRING COMMENT '活动ID',
      `activity_name` STRING COMMENT '活动名称',
      `start_date` STRING COMMENT '活动开始日期',
      `order_count` BIGINT COMMENT '参与活动订单数',
      `order_original_amount` DECIMAL(16,2) COMMENT '参与活动订单原始金额',
      `order_final_amount` DECIMAL(16,2) COMMENT '参与活动订单最终金额',
      `reduce_amount` DECIMAL(16,2) COMMENT '优惠金额',
      `reduce_rate` DECIMAL(16,2) COMMENT '补贴率'
    ) COMMENT '商品销售统计'
    ROW FORMAT DELIMITED  FIELDS TERMINATED BY '\t'
    LOCATION '/warehouse/gmall/ads/ads_activity_stats/';
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
  2. 数据装载

    insert overwrite table ads_activity_stats
    select * from ads_activity_stats
    union
    select
        '2020-06-14' dt,
        t4.activity_id,
        activity_name,
        start_date,
        order_count,
        order_original_amount,
        order_final_amount,
        reduce_amount,
        reduce_rate
    from
    (
        select
            activity_id,
            activity_name,
            date_format(start_time,'yyyy-MM-dd') start_date
        from dim_activity_rule_info
        where dt='2020-06-14'
        and date_format(start_time,'yyyy-MM-dd')>=date_add('2020-06-14',-29)
        group by activity_id,activity_name,start_time
    )t4
    left join
    (
        select
            activity_id,
            sum(order_count) order_count,
            sum(order_original_amount) order_original_amount,
            sum(order_final_amount) order_final_amount,
            sum(order_reduce_amount) reduce_amount,
            cast(sum(order_reduce_amount)/sum(order_original_amount)*100 as decimal(16,2)) reduce_rate
        from dwt_activity_topic
        where dt='2020-06-14'
        group by activity_id
    )t5
    on t4.activity_id=t5.activity_id;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38

# ADS层业务数据导入脚本

  1. 在/home/damoncai/bin目录下创建脚本dwt_to_ads.sh

    #!/bin/bash
    
    APP=gmall
    
    # 如果是输入的日期按照取输入日期;如果没输入日期取当前时间的前一天
    if [ -n "$2" ] ;then
        do_date=$2
    else 
        do_date=`date -d "-1 day" +%F`
    fi
    
    ads_activity_stats="
    insert overwrite table ${APP}.ads_activity_stats
    select * from ${APP}.ads_activity_stats
    union
    select
        '$do_date' dt,
        t4.activity_id,
        activity_name,
        start_date,
        order_count,
        order_original_amount,
        order_final_amount,
        reduce_amount,
        reduce_rate
    from
    (
        select
            activity_id,
            activity_name,
            date_format(start_time,'yyyy-MM-dd') start_date
        from ${APP}.dim_activity_rule_info
        where dt='$do_date'
        and date_format(start_time,'yyyy-MM-dd')>=date_add('$do_date',-29)
        group by activity_id,activity_name,start_time
    )t4
    left join
    (
        select
            activity_id,
            sum(order_count) order_count,
            sum(order_original_amount) order_original_amount,
            sum(order_final_amount) order_final_amount,
            sum(order_reduce_amount) reduce_amount,
            cast(sum(order_reduce_amount)/sum(order_original_amount)*100 as decimal(16,2)) reduce_rate
        from ${APP}.dwt_activity_topic
        where dt='$do_date'
        group by activity_id
    )t5
    on t4.activity_id=t5.activity_id;
    "
    ads_coupon_stats="
    insert overwrite table ${APP}.ads_coupon_stats
    select * from ${APP}.ads_coupon_stats
    union
    select
        '$do_date' dt,
        t1.id,
        coupon_name,
        start_date,
        rule_name,
        get_count,
        order_count,
        expire_count,
        order_original_amount,
        order_final_amount,
        reduce_amount,
        reduce_rate
    from
    (
        select
            id,
            coupon_name,
            date_format(start_time,'yyyy-MM-dd') start_date,
            case
                when coupon_type='3201' then concat('满',condition_amount,'元减',benefit_amount,'元')
                when coupon_type='3202' then concat('满',condition_num,'件打', (1-benefit_discount)*10,'折')
                when coupon_type='3203' then concat('减',benefit_amount,'元')
            end rule_name
        from ${APP}.dim_coupon_info
        where dt='$do_date'
        and date_format(start_time,'yyyy-MM-dd')>=date_add('$do_date',-29)
    )t1
    left join
    (
        select
            coupon_id,
            get_count,
            order_count,
            expire_count,
            order_original_amount,
            order_final_amount,
            order_reduce_amount reduce_amount,
            cast(order_reduce_amount/order_original_amount as decimal(16,2)) reduce_rate
        from ${APP}.dwt_coupon_topic
        where dt='$do_date'
    )t2
    on t1.id=t2.coupon_id;
    "
    
    ads_order_by_province="
    insert overwrite table ${APP}.ads_order_by_province
    select * from ${APP}.ads_order_by_province
    union
    select
        dt,
        recent_days,
        province_id,
        province_name,
        area_code,
        iso_code,
        iso_3166_2,
        order_count,
        order_amount
    from
    (
        select
            '$do_date' dt,
            recent_days,
            province_id,
            sum(order_count) order_count,
            sum(order_amount) order_amount
        from
        (
            select
                recent_days,
                province_id,
                case
                    when recent_days=1 then order_last_1d_count
                    when recent_days=7 then order_last_7d_count
                    when recent_days=30 then order_last_30d_count
                end order_count,
                case
                    when recent_days=1 then order_last_1d_final_amount
                    when recent_days=7 then order_last_7d_final_amount
                    when recent_days=30 then order_last_30d_final_amount
                end order_amount
            from ${APP}.dwt_area_topic lateral view explode(Array(1,7,30)) tmp as recent_days
            where dt='$do_date'
        )t1
        group by recent_days,province_id
    )t2
    join ${APP}.dim_base_province t3
    on t2.province_id=t3.id;
    "
    
    ads_order_spu_stats="
    insert overwrite table ${APP}.ads_order_spu_stats
    select * from ${APP}.ads_order_spu_stats
    union
    select
        '$do_date' dt,
        recent_days,
        spu_id,
        spu_name,
        tm_id,
        tm_name,
        category3_id,
        category3_name,
        category2_id,
        category2_name,
        category1_id,
        category1_name,
        sum(order_count),
        sum(order_amount)
    from
    (
        select
            recent_days,
            sku_id,
            case
                when recent_days=1 then order_last_1d_count
                when recent_days=7 then order_last_7d_count
                when recent_days=30 then order_last_30d_count
            end order_count,
            case
                when recent_days=1 then order_last_1d_final_amount
                when recent_days=7 then order_last_7d_final_amount
                when recent_days=30 then order_last_30d_final_amount
            end order_amount
        from ${APP}.dwt_sku_topic lateral view explode(Array(1,7,30)) tmp as recent_days
        where dt='$do_date'
    )t1
    left join
    (
        select
            id,
            spu_id,
            spu_name,
            tm_id,
            tm_name,
            category3_id,
            category3_name,
            category2_id,
            category2_name,
            category1_id,
            category1_name
        from ${APP}.dim_sku_info
        where dt='$do_date'
    )t2
    on t1.sku_id=t2.id
    group by recent_days,spu_id,spu_name,tm_id,tm_name,category3_id,category3_name,category2_id,category2_name,category1_id,category1_name;
    "
    
    ads_order_total="
    insert overwrite table ${APP}.ads_order_total
    select * from ${APP}.ads_order_total
    union
    select
        '$do_date',
        recent_days,
        sum(order_count),
        sum(order_final_amount) order_final_amount,
        sum(if(order_final_amount>0,1,0)) order_user_count
    from
    (
        select
            recent_days,
            user_id,
            case when recent_days=0 then order_count
                 when recent_days=1 then order_last_1d_count
                 when recent_days=7 then order_last_7d_count
                 when recent_days=30 then order_last_30d_count
            end order_count,
            case when recent_days=0 then order_final_amount
                 when recent_days=1 then order_last_1d_final_amount
                 when recent_days=7 then order_last_7d_final_amount
                 when recent_days=30 then order_last_30d_final_amount
            end order_final_amount
        from ${APP}.dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days
        where dt='$do_date'
    )t1
    group by recent_days;
    "
    
    ads_page_path="
    insert overwrite table ${APP}.ads_page_path
    select * from ${APP}.ads_page_path
    union
    select
        '$do_date',
        recent_days,
        source,
        target,
        count(*)
    from
    (
        select
            recent_days,
            concat('step-',step,':',source) source,
            concat('step-',step+1,':',target) target
        from
        (
            select
                recent_days,
                page_id source,
                lead(page_id,1,null) over (partition by recent_days,session_id order by ts) target,
                row_number() over (partition by recent_days,session_id order by ts) step
            from
            (
                select
                    recent_days,
                    last_page_id,
                    page_id,
                    ts,
                    concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by mid_id,recent_days order by ts)) session_id
                from ${APP}.dwd_page_log lateral view explode(Array(1,7,30)) tmp as recent_days
                where dt>=date_add('$do_date',-30)
                and dt>=date_add('$do_date',-recent_days+1)
            )t2
        )t3
    )t4
    group by recent_days,source,target;
    "
    
    ads_repeat_purchase="
    insert overwrite table ${APP}.ads_repeat_purchase
    select * from ${APP}.ads_repeat_purchase
    union
    select
        '$do_date' dt,
        recent_days,
        tm_id,
        tm_name,
        cast(sum(if(order_count>=2,1,0))/sum(if(order_count>=1,1,0))*100 as decimal(16,2))
    from
    (
        select
            recent_days,
            user_id,
            tm_id,
            tm_name,
            sum(order_count) order_count
        from
        (
            select
                recent_days,
                user_id,
                sku_id,
                count(*) order_count
            from ${APP}.dwd_order_detail lateral view explode(Array(1,7,30)) tmp as recent_days
            where dt>=date_add('$do_date',-29)
            and dt>=date_add('$do_date',-recent_days+1)
            group by recent_days, user_id,sku_id
        )t1
        left join
        (
            select
                id,
                tm_id,
                tm_name
            from ${APP}.dim_sku_info
            where dt='$do_date'
        )t2
        on t1.sku_id=t2.id
        group by recent_days,user_id,tm_id,tm_name
    )t3
    group by recent_days,tm_id,tm_name;
    "
    
    ads_user_action="
    with
    tmp_page as
    (
        select
            '$do_date' dt,
            recent_days,
            sum(if(array_contains(pages,'home'),1,0)) home_count,
            sum(if(array_contains(pages,'good_detail'),1,0)) good_detail_count
        from
        (
            select
                recent_days,
                mid_id,
                collect_set(page_id) pages
            from
            (
                select
                    dt,
                    mid_id,
                    page.page_id
                from ${APP}.dws_visitor_action_daycount lateral view explode(page_stats) tmp as page
                where dt>=date_add('$do_date',-29)
                and page.page_id in('home','good_detail')
            )t1 lateral view explode(Array(1,7,30)) tmp as recent_days
            where dt>=date_add('$do_date',-recent_days+1)
            group by recent_days,mid_id
        )t2
        group by recent_days
    ),
    tmp_cop as
    (
        select
            '$do_date' dt,
            recent_days,
            sum(if(cart_count>0,1,0)) cart_count,
            sum(if(order_count>0,1,0)) order_count,
            sum(if(payment_count>0,1,0)) payment_count
        from
        (
            select
                recent_days,
                user_id,
                case
                    when recent_days=1 then cart_last_1d_count
                    when recent_days=7 then cart_last_7d_count
                    when recent_days=30 then cart_last_30d_count
                end cart_count,
                case
                    when recent_days=1 then order_last_1d_count
                    when recent_days=7 then order_last_7d_count
                    when recent_days=30 then order_last_30d_count
                end order_count,
                case
                    when recent_days=1 then payment_last_1d_count
                    when recent_days=7 then payment_last_7d_count
                    when recent_days=30 then payment_last_30d_count
                end payment_count
            from ${APP}.dwt_user_topic lateral view explode(Array(1,7,30)) tmp as recent_days
            where dt='$do_date'
        )t1
        group by recent_days
    )
    insert overwrite table ${APP}.ads_user_action
    select * from ${APP}.ads_user_action
    union
    select
        tmp_page.dt,
        tmp_page.recent_days,
        home_count,
        good_detail_count,
        cart_count,
        order_count,
        payment_count
    from tmp_page
    join tmp_cop
    on tmp_page.recent_days=tmp_cop.recent_days;
    "
    
    ads_user_change="
    insert overwrite table ${APP}.ads_user_change
    select * from ${APP}.ads_user_change
    union
    select
        churn.dt,
        user_churn_count,
        user_back_count
    from
    (
        select
            '$do_date' dt,
            count(*) user_churn_count
        from ${APP}.dwt_user_topic
        where dt='$do_date'
        and login_date_last=date_add('$do_date',-7)
    )churn
    join
    (
        select
            '$do_date' dt,
            count(*) user_back_count
        from
        (
            select
                user_id,
                login_date_last
            from ${APP}.dwt_user_topic
            where dt='$do_date'
            and login_date_last='$do_date'
        )t1
        join
        (
            select
                user_id,
                login_date_last login_date_previous
            from ${APP}.dwt_user_topic
            where dt=date_add('$do_date',-1)
        )t2
        on t1.user_id=t2.user_id
        where datediff(login_date_last,login_date_previous)>=8
    )back
    on churn.dt=back.dt;
    "
    
    ads_user_retention="
    insert overwrite table ${APP}.ads_user_retention
    select * from ${APP}.ads_user_retention
    union
    select
        '$do_date',
        login_date_first create_date,
        datediff('$do_date',login_date_first) retention_day,
        sum(if(login_date_last='$do_date',1,0)) retention_count,
        count(*) new_user_count,
        cast(sum(if(login_date_last='$do_date',1,0))/count(*)*100 as decimal(16,2)) retention_rate
    from ${APP}.dwt_user_topic
    where dt='$do_date'
    and login_date_first>=date_add('$do_date',-7)
    and login_date_first<'$do_date'
    group by login_date_first;
    "
    
    ads_user_total="
    insert overwrite table ${APP}.ads_user_total
    select * from ${APP}.ads_user_total
    union
    select
        '$do_date',
        recent_days,
        sum(if(login_date_first>=recent_days_ago,1,0)) new_user_count,
        sum(if(order_date_first>=recent_days_ago,1,0)) new_order_user_count,
        sum(order_final_amount) order_final_amount,
        sum(if(order_final_amount>0,1,0)) order_user_count,
        sum(if(login_date_last>=recent_days_ago and order_final_amount=0,1,0)) no_order_user_count
    from
    (
        select
            recent_days,
            user_id,
            login_date_first,
            login_date_last,
            order_date_first,
            case when recent_days=0 then order_final_amount
                 when recent_days=1 then order_last_1d_final_amount
                 when recent_days=7 then order_last_7d_final_amount
                 when recent_days=30 then order_last_30d_final_amount
            end order_final_amount,
            if(recent_days=0,'1970-01-01',date_add('$do_date',-recent_days+1)) recent_days_ago
        from ${APP}.dwt_user_topic lateral view explode(Array(0,1,7,30)) tmp as recent_days
        where dt='$do_date'
    )t1
    group by recent_days;
    "
    
    ads_visit_stats="
    insert overwrite table ${APP}.ads_visit_stats
    select * from ${APP}.ads_visit_stats
    union
    select
        '$do_date' dt,
        is_new,
        recent_days,
        channel,
        count(distinct(mid_id)) uv_count,
        cast(sum(duration)/1000 as bigint) duration_sec,
        cast(avg(duration)/1000 as bigint) avg_duration_sec,
        sum(page_count) page_count,
        cast(avg(page_count) as bigint) avg_page_count,
        count(*) sv_count,
        sum(if(page_count=1,1,0)) bounce_count,
        cast(sum(if(page_count=1,1,0))/count(*)*100 as decimal(16,2)) bounce_rate
    from
    (
        select
            session_id,
            mid_id,
            is_new,
            recent_days,
            channel,
            count(*) page_count,
            sum(during_time) duration
        from
        (
            select
                mid_id,
                channel,
                recent_days,
                is_new,
                last_page_id,
                page_id,
                during_time,
                concat(mid_id,'-',last_value(if(last_page_id is null,ts,null),true) over (partition by recent_days,mid_id order by ts)) session_id
            from
            (
                select
                    mid_id,
                    channel,
                    last_page_id,
                    page_id,
                    during_time,
                    ts,
                    recent_days,
                    if(visit_date_first>=date_add('$do_date',-recent_days+1),'1','0') is_new
                from
                (
                    select
                        t1.mid_id,
                        t1.channel,
                        t1.last_page_id,
                        t1.page_id,
                        t1.during_time,
                        t1.dt,
                        t1.ts,
                        t2.visit_date_first
                    from
                    (
                        select
                            mid_id,
                            channel,
                            last_page_id,
                            page_id,
                            during_time,
                            dt,
                            ts
                        from ${APP}.dwd_page_log
                        where dt>=date_add('$do_date',-30)
                    )t1
                    left join
                    (
                        select
                            mid_id,
                            visit_date_first
                        from ${APP}.dwt_visitor_topic
                        where dt='$do_date'
                    )t2
                    on t1.mid_id=t2.mid_id
                )t3 lateral view explode(Array(1,7,30)) tmp as recent_days
                where dt>=date_add('$do_date',-recent_days+1)
            )t4
        )t5
        group by session_id,mid_id,is_new,recent_days,channel
    )t6
    group by is_new,recent_days,channel;
    "
    
    case $1 in
        "ads_activity_stats" )
            hive -e "$ads_activity_stats" 
        ;;
        "ads_coupon_stats" )
            hive -e "$ads_coupon_stats"
        ;;
        "ads_order_by_province" )
            hive -e "$ads_order_by_province" 
        ;;
        "ads_order_spu_stats" )
            hive -e "$ads_order_spu_stats" 
        ;;
        "ads_order_total" )
            hive -e "$ads_order_total" 
        ;;
        "ads_page_path" )
            hive -e "$ads_page_path" 
        ;;
        "ads_repeat_purchase" )
            hive -e "$ads_repeat_purchase" 
        ;;
        "ads_user_action" )
            hive -e "$ads_user_action" 
        ;;
        "ads_user_change" )
            hive -e "$ads_user_change" 
        ;;
        "ads_user_retention" )
            hive -e "$ads_user_retention" 
        ;;
        "ads_user_total" )
            hive -e "$ads_user_total" 
        ;;
        "ads_visit_stats" )
            hive -e "$ads_visit_stats" 
        ;;
        "all" )
            hive -e "$ads_activity_stats$ads_coupon_stats$ads_order_by_province$ads_order_spu_stats$ads_order_total$ads_page_path$ads_repeat_purchase$ads_user_action$ads_user_change$ads_user_retention$ads_user_total$ads_visit_stats"
        ;;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153
    154
    155
    156
    157
    158
    159
    160
    161
    162
    163
    164
    165
    166
    167
    168
    169
    170
    171
    172
    173
    174
    175
    176
    177
    178
    179
    180
    181
    182
    183
    184
    185
    186
    187
    188
    189
    190
    191
    192
    193
    194
    195
    196
    197
    198
    199
    200
    201
    202
    203
    204
    205
    206
    207
    208
    209
    210
    211
    212
    213
    214
    215
    216
    217
    218
    219
    220
    221
    222
    223
    224
    225
    226
    227
    228
    229
    230
    231
    232
    233
    234
    235
    236
    237
    238
    239
    240
    241
    242
    243
    244
    245
    246
    247
    248
    249
    250
    251
    252
    253
    254
    255
    256
    257
    258
    259
    260
    261
    262
    263
    264
    265
    266
    267
    268
    269
    270
    271
    272
    273
    274
    275
    276
    277
    278
    279
    280
    281
    282
    283
    284
    285
    286
    287
    288
    289
    290
    291
    292
    293
    294
    295
    296
    297
    298
    299
    300
    301
    302
    303
    304
    305
    306
    307
    308
    309
    310
    311
    312
    313
    314
    315
    316
    317
    318
    319
    320
    321
    322
    323
    324
    325
    326
    327
    328
    329
    330
    331
    332
    333
    334
    335
    336
    337
    338
    339
    340
    341
    342
    343
    344
    345
    346
    347
    348
    349
    350
    351
    352
    353
    354
    355
    356
    357
    358
    359
    360
    361
    362
    363
    364
    365
    366
    367
    368
    369
    370
    371
    372
    373
    374
    375
    376
    377
    378
    379
    380
    381
    382
    383
    384
    385
    386
    387
    388
    389
    390
    391
    392
    393
    394
    395
    396
    397
    398
    399
    400
    401
    402
    403
    404
    405
    406
    407
    408
    409
    410
    411
    412
    413
    414
    415
    416
    417
    418
    419
    420
    421
    422
    423
    424
    425
    426
    427
    428
    429
    430
    431
    432
    433
    434
    435
    436
    437
    438
    439
    440
    441
    442
    443
    444
    445
    446
    447
    448
    449
    450
    451
    452
    453
    454
    455
    456
    457
    458
    459
    460
    461
    462
    463
    464
    465
    466
    467
    468
    469
    470
    471
    472
    473
    474
    475
    476
    477
    478
    479
    480
    481
    482
    483
    484
    485
    486
    487
    488
    489
    490
    491
    492
    493
    494
    495
    496
    497
    498
    499
    500
    501
    502
    503
    504
    505
    506
    507
    508
    509
    510
    511
    512
    513
    514
    515
    516
    517
    518
    519
    520
    521
    522
    523
    524
    525
    526
    527
    528
    529
    530
    531
    532
    533
    534
    535
    536
    537
    538
    539
    540
    541
    542
    543
    544
    545
    546
    547
    548
    549
    550
    551
    552
    553
    554
    555
    556
    557
    558
    559
    560
    561
    562
    563
    564
    565
    566
    567
    568
    569
    570
    571
    572
    573
    574
    575
    576
    577
    578
    579
    580
    581
    582
    583
    584
    585
    586
    587
    588
    589
    590
    591
    592
    593
    594
    595
    596
    597
    598
    599
    600
    601
    602
    603
    604
    605
    606
    607
    608
    609
    610
    611
    612
    613
    614
    615
    616
    617
    618
    619
    620
    621
    622
    623
    624
    625
    626
  2. 增加脚本执行权限

  3. 执行脚本

    dwt_to_ads.sh all 2020-06-14
    
    1

# Azkaban

# 安装使用

# 创建数据库和表

  1. 创建gmall_report数据库

  2. 创建表

    DROP TABLE IF EXISTS ads_visit_stats;
    CREATE TABLE `ads_visit_stats` (
      `dt` DATE NOT NULL COMMENT '统计日期',
      `is_new` VARCHAR(255) NOT NULL COMMENT '新老标识,1:新,0:老',
      `recent_days` INT NOT NULL COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
      `channel` VARCHAR(255) NOT NULL COMMENT '渠道',
      `uv_count` BIGINT(20) DEFAULT NULL COMMENT '日活(访问人数)',
      `duration_sec` BIGINT(20) DEFAULT NULL COMMENT '页面停留总时长',
      `avg_duration_sec` BIGINT(20)  DEFAULT NULL COMMENT '一次会话,页面停留平均时长',
      `page_count` BIGINT(20) DEFAULT NULL COMMENT '页面总浏览数',
      `avg_page_count` BIGINT(20) DEFAULT NULL COMMENT '一次会话,页面平均浏览数',
      `sv_count` BIGINT(20) DEFAULT NULL COMMENT '会话次数',
      `bounce_count` BIGINT(20) DEFAULT NULL COMMENT '跳出数',
      `bounce_rate` DECIMAL(16,2) DEFAULT NULL COMMENT '跳出率',
      PRIMARY KEY (`dt`,`recent_days`,`is_new`,`channel`)
    ) ENGINE=INNODB DEFAULT CHARSET=utf8;
    
    DROP TABLE IF EXISTS ads_page_path;
    CREATE TABLE `ads_page_path` (      
      `dt` DATE NOT NULL COMMENT '统计日期',
      `recent_days` BIGINT(20) NOT NULL COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
      `source` VARCHAR(255) DEFAULT NULL COMMENT '跳转起始页面',
      `target` VARCHAR(255) DEFAULT NULL COMMENT '跳转终到页面',
      `path_count` BIGINT(255) DEFAULT NULL COMMENT '跳转次数',
      UNIQUE KEY (`dt`,`recent_days`,`source`,`target`) USING BTREE     
    ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
    
    DROP TABLE IF EXISTS ads_user_total;
    CREATE TABLE `ads_user_total` (          
      `dt` DATE NOT NULL COMMENT '统计日期',
      `recent_days` BIGINT(20) NOT NULL COMMENT '最近天数,0:累积值,1:最近1天,7:最近7天,30:最近30天',
      `new_user_count` BIGINT(20) DEFAULT NULL COMMENT '新注册用户数',
      `new_order_user_count` BIGINT(20) DEFAULT NULL COMMENT '新增下单用户数',
      `order_final_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '下单总金额',
      `order_user_count` BIGINT(20) DEFAULT NULL COMMENT '下单用户数',
      `no_order_user_count` BIGINT(20) DEFAULT NULL COMMENT '未下单用户数(具体指活跃用户中未下单用户)',
      PRIMARY KEY (`dt`,`recent_days`)           
    ) ENGINE=INNODB DEFAULT CHARSET=utf8;
    
    DROP TABLE IF EXISTS ads_user_change;
    CREATE TABLE `ads_user_change` (
      `dt` DATE NOT NULL COMMENT '统计日期',
      `user_churn_count` BIGINT(20) DEFAULT NULL  COMMENT '流失用户数',
      `user_back_count` BIGINT(20) DEFAULT NULL  COMMENT '回流用户数',
      PRIMARY KEY (`dt`)
    ) ENGINE=INNODB DEFAULT CHARSET=utf8;
    
    DROP TABLE IF EXISTS ads_user_action;
    CREATE TABLE `ads_user_action` (
      `dt` DATE NOT NULL COMMENT '统计日期',
      `recent_days` BIGINT(20) NOT NULL COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
      `home_count` BIGINT(20) DEFAULT NULL COMMENT '浏览首页人数',
      `good_detail_count` BIGINT(20) DEFAULT NULL COMMENT '浏览商品详情页人数',
      `cart_count` BIGINT(20) DEFAULT NULL COMMENT '加入购物车人数',
      `order_count` BIGINT(20) DEFAULT NULL COMMENT '下单人数',
      `payment_count` BIGINT(20) DEFAULT NULL COMMENT '支付人数',
      PRIMARY KEY (`dt`,`recent_days`) USING BTREE
    ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
    
    DROP TABLE IF EXISTS ads_user_retention;
    CREATE TABLE `ads_user_retention` (      
      `dt` DATE DEFAULT NULL COMMENT '统计日期',
      `create_date` VARCHAR(255) NOT NULL COMMENT '用户新增日期',
      `retention_day` BIGINT(20) NOT NULL COMMENT '截至当前日期留存天数',
      `retention_count` BIGINT(20) DEFAULT NULL COMMENT '留存用户数量',
      `new_user_count` BIGINT(20) DEFAULT NULL COMMENT '新增用户数量',
      `retention_rate` DECIMAL(16,2) DEFAULT NULL COMMENT '留存率',
      PRIMARY KEY (`create_date`,`retention_day`) USING BTREE        
    ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
    
    DROP TABLE IF EXISTS ads_order_total;
     CREATE TABLE `ads_order_total` (   
      `dt` DATE NOT NULL COMMENT '统计日期', 
      `recent_days` BIGINT(20) NOT NULL COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
      `order_count` BIGINT(255) DEFAULT NULL COMMENT '订单数', 
      `order_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '订单金额', 
      `order_user_count` BIGINT(255) DEFAULT NULL COMMENT '下单人数',
      PRIMARY KEY (`dt`,`recent_days`)  
    ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
    
    DROP TABLE IF EXISTS ads_order_by_province;
    CREATE TABLE `ads_order_by_province` (
      `dt` DATE NOT NULL,
      `recent_days` BIGINT(20) NOT NULL COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
      `province_id` VARCHAR(255) NOT NULL COMMENT '统计日期',
      `province_name` VARCHAR(255) DEFAULT NULL COMMENT '省份名称',
      `area_code` VARCHAR(255) DEFAULT NULL COMMENT '地区编码',
      `iso_code` VARCHAR(255) DEFAULT NULL COMMENT '国际标准地区编码',
      `iso_code_3166_2` VARCHAR(255) DEFAULT NULL COMMENT '国际标准地区编码',
      `order_count` BIGINT(20) DEFAULT NULL COMMENT '订单数',
      `order_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '订单金额',
      PRIMARY KEY (`dt`, `recent_days` ,`province_id`) USING BTREE       
    ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
    
    DROP TABLE IF EXISTS ads_repeat_purchase;
    CREATE TABLE `ads_repeat_purchase` (         
      `dt` DATE NOT NULL COMMENT '统计日期',
      `recent_days` BIGINT(20) NOT NULL COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
      `tm_id` VARCHAR(255) NOT NULL COMMENT '品牌ID',
      `tm_name` VARCHAR(255) DEFAULT NULL COMMENT '品牌名称',
      `order_repeat_rate` DECIMAL(16,2) DEFAULT NULL COMMENT '复购率',
      PRIMARY KEY (`dt` ,`recent_days`,`tm_id`)          
    ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
    
    DROP TABLE IF EXISTS ads_order_spu_stats;
    CREATE TABLE `ads_order_spu_stats` (
      `dt` DATE NOT NULL COMMENT '统计日期',
      `recent_days` BIGINT(20) NOT NULL COMMENT '最近天数,1:最近1天,7:最近7天,30:最近30天',
      `spu_id` VARCHAR(255) NOT NULL COMMENT '商品ID',
      `spu_name` VARCHAR(255) DEFAULT NULL COMMENT '商品名称',
      `tm_id` VARCHAR(255) NOT NULL COMMENT '品牌ID',
      `tm_name` VARCHAR(255) DEFAULT NULL COMMENT '品牌名称',
      `category3_id` VARCHAR(255) NOT NULL COMMENT '三级品类ID',
      `category3_name` VARCHAR(255) DEFAULT NULL COMMENT '三级品类名称',
      `category2_id` VARCHAR(255) NOT NULL COMMENT '二级品类ID',
      `category2_name` VARCHAR(255) DEFAULT NULL COMMENT '二级品类名称',
      `category1_id` VARCHAR(255) NOT NULL COMMENT '一级品类ID',
      `category1_name` VARCHAR(255) NOT NULL COMMENT '一级品类名称',
      `order_count` BIGINT(20) DEFAULT NULL COMMENT '订单数',
      `order_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '订单金额', 
      PRIMARY KEY (`dt`,`recent_days`,`spu_id`)  
    ) ENGINE=INNODB DEFAULT CHARSET=utf8;
    
    DROP TABLE IF EXISTS ads_activity_stats;
    CREATE TABLE `ads_activity_stats` (
      `dt` DATE NOT NULL COMMENT '统计日期',
      `activity_id` VARCHAR(255) NOT NULL COMMENT '活动ID',
      `activity_name` VARCHAR(255) DEFAULT NULL COMMENT '活动名称',
      `start_date` DATE DEFAULT NULL COMMENT '开始日期',
      `order_count` BIGINT(11) DEFAULT NULL COMMENT '参与活动订单数',
      `order_original_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '参与活动订单原始金额',
      `order_final_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '参与活动订单最终金额',
      `reduce_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '优惠金额',
      `reduce_rate` DECIMAL(16,2) DEFAULT NULL COMMENT '补贴率',
      PRIMARY KEY (`dt`,`activity_id` )
    ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
    
    DROP TABLE IF EXISTS ads_coupon_stats;
    CREATE TABLE `ads_coupon_stats` (
      `dt` DATE NOT NULL COMMENT '统计日期',
      `coupon_id` VARCHAR(255) NOT NULL COMMENT '优惠券ID',
      `coupon_name` VARCHAR(255) DEFAULT NULL COMMENT '优惠券名称',
      `start_date` DATE DEFAULT NULL COMMENT '开始日期',  
      `rule_name`  VARCHAR(200) DEFAULT NULL COMMENT '优惠规则',
      `get_count`  BIGINT(20) DEFAULT NULL COMMENT '领取次数',
      `order_count` BIGINT(20) DEFAULT NULL COMMENT '使用(下单)次数',
      `expire_count`  BIGINT(20) DEFAULT NULL COMMENT '过期次数',
      `order_original_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '使用优惠券订单原始金额',
      `order_final_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '使用优惠券订单最终金额',
      `reduce_amount` DECIMAL(16,2) DEFAULT NULL COMMENT '优惠金额',
      `reduce_rate` DECIMAL(16,2) DEFAULT NULL COMMENT '补贴率',
      PRIMARY KEY (`dt`,`coupon_id` )
    ) ENGINE=INNODB DEFAULT CHARSET=utf8 ROW_FORMAT=DYNAMIC;
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83
    84
    85
    86
    87
    88
    89
    90
    91
    92
    93
    94
    95
    96
    97
    98
    99
    100
    101
    102
    103
    104
    105
    106
    107
    108
    109
    110
    111
    112
    113
    114
    115
    116
    117
    118
    119
    120
    121
    122
    123
    124
    125
    126
    127
    128
    129
    130
    131
    132
    133
    134
    135
    136
    137
    138
    139
    140
    141
    142
    143
    144
    145
    146
    147
    148
    149
    150
    151
    152
    153

# Sqoop导出脚本

  1. 在/home/damoncai/bin目录下创建脚本hdfs_to_mysql.sh

    #!/bin/bash
    
    hive_db_name=gmall
    mysql_db_name=gmall_report
    
    export_data() {
    /opt/module/sqoop/bin/sqoop export \
    --connect "jdbc:mysql://ha01:3306/${mysql_db_name}?useUnicode=true&characterEncoding=utf-8"  \
    --username root \
    --password 000000 \
    --table $1 \
    --num-mappers 1 \
    --export-dir /warehouse/$hive_db_name/ads/$1 \
    --input-fields-terminated-by "\t" \
    --update-mode allowinsert \
    --update-key $2 \
    --input-null-string '\\N'    \
    --input-null-non-string '\\N'
    }
    
    case $1 in
      "ads_activity_stats" )
        export_data "ads_activity_stats" "dt,activity_id"
      ;;
    
      "ads_coupon_stats" )
        export_data "ads_coupon_stats" "dt,coupon_id"
      ;;
    
      "ads_order_by_province" )
        export_data "ads_order_by_province" "dt,recent_days,province_id"
      ;;
    
      "ads_order_spu_stats" )
        export_data "ads_order_spu_stats" "dt,recent_days,spu_id"
      ;;
    
      "ads_order_total" )
        export_data "ads_order_total" "dt,recent_days"
      ;;
    
      "ads_page_path" )
        export_data "ads_page_path" "dt,recent_days,source,target"
      ;;
    
      "ads_repeat_purchase" )
        export_data "ads_repeat_purchase" "dt,recent_days,tm_id"
      ;;
    
      "ads_user_action" )
        export_data "ads_user_action" "dt,recent_days"
      ;;
    
      "ads_user_change" )
        export_data "ads_user_change" "dt"
      ;;
    
      "ads_user_retention" )
        export_data "ads_user_retention" "create_date,retention_day"
      ;;
    
      "ads_user_total" )
        export_data "ads_user_total" "dt,recent_days"
      ;;
    
      "ads_visit_stats" )
        export_data "ads_visit_stats" "dt,recent_days,is_new,channel"
      ;;
      "all" )
        export_data "ads_activity_stats" "dt,activity_id"
        export_data "ads_coupon_stats" "dt,coupon_id"
        export_data "ads_order_by_province" "dt,recent_days,province_id"
        export_data "ads_order_spu_stats" "dt,recent_days,spu_id"
        export_data "ads_order_total" "dt,recent_days"
        export_data "ads_page_path" "dt,recent_days,source,target"
        export_data "ads_repeat_purchase" "dt,recent_days,tm_id"
        export_data "ads_user_action" "dt,recent_days"
        export_data "ads_user_change" "dt"
        export_data "ads_user_retention" "create_date,retention_day"
        export_data "ads_user_total" "dt,recent_days"
        export_data "ads_visit_stats" "dt,recent_days,is_new,channel"
      ;;
    esac
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
    69
    70
    71
    72
    73
    74
    75
    76
    77
    78
    79
    80
    81
    82
    83

    关于导出update还是insert的问题

    --update-mode:

    updateonly 只更新,无法插入新数据

    ​ allowinsert 允许新增

    --update-key:允许更新的情况下,指定哪些字段匹配视为同一条数据,进行更新而不增加。多个字段用逗号分隔。

    --input-null-string和--input-null-non-string:

    分别表示,将字符串列和非字符串列的空串和“null”转义。

    Hive中的Null在底层是以“\N”来存储,而MySQL中的Null在底层就是Null,为了保证数据两端的一致性。在导出数据时采用--input-null-string和--input-null-non-string两个参数。导入数据时采用--null-string和--null-non-string。

  2. 添加执行权限

  3. 执行sqoop脚本

    hdfs_to_mysql.sh all
    
    1

# 全调度流程

# 数据准备

# 用户行为数据

  1. 修改/opt/module/applog下的application.properties

    #业务日期
    mock.date=2020-06-15
    
    1
    2

    注意:分发至其他需要生成数据的节点

    xsync application.properties
    
    1
  2. 生成数据

    lg.sh
    
    1

    注意:生成数据之后,记得查看HDFS数据是否存在!

# 业务数据准备

  1. 修改/opt/module/db_log下的application.properties

    #业务日期
    mock.date=2020-06-15
    
    1
    2
  2. 生成数据

    java -jar gmall2020-mock-db-2020-04-01.jar
    
    1
  3. 查询order_infor表中operate_time中有2020-06-15日期的数据

# 编写Azkaban工作流程配置文件

  1. 编写azkaban.project文件,内容如下

    azkaban-flow-version: 2.0
    
    1
  2. 编写gmall.flow文件,内容如下

    nodes:
      - name: mysql_to_hdfs
        type: command
        config:
         command: /home/damoncai/bin/mysql_to_hdfs.sh all ${dt}
        
      - name: hdfs_to_ods_log
        type: command
        config:
         command: /home/damoncai/bin/hdfs_to_ods_log.sh ${dt}
         
      - name: hdfs_to_ods_db
        type: command
        dependsOn: 
         - mysql_to_hdfs
        config: 
         command: /home/damoncai/bin/hdfs_to_ods_db.sh all ${dt}
      
      - name: ods_to_dim_db
        type: command
        dependsOn: 
         - hdfs_to_ods_db
        config: 
         command: /home/damoncai/bin/ods_to_dim_db.sh all ${dt}
    
      - name: ods_to_dwd_log
        type: command
        dependsOn: 
         - hdfs_to_ods_log
        config: 
         command: /home/damoncai/bin/ods_to_dwd_log.sh all ${dt}
        
      - name: ods_to_dwd_db
        type: command
        dependsOn: 
         - hdfs_to_ods_db
        config: 
         command: /home/damoncai/bin/ods_to_dwd_db.sh all ${dt}
        
      - name: dwd_to_dws
        type: command
        dependsOn:
         - ods_to_dim_db
         - ods_to_dwd_log
         - ods_to_dwd_db
        config:
         command: /home/damoncai/bin/dwd_to_dws.sh all ${dt}
        
      - name: dws_to_dwt
        type: command
        dependsOn:
         - dwd_to_dws
        config:
         command: /home/damoncai/bin/dws_to_dwt.sh all ${dt}
        
      - name: dwt_to_ads
        type: command
        dependsOn: 
         - dws_to_dwt
        config:
         command: /home/damoncai/bin/dwt_to_ads.sh all ${dt}
         
      - name: hdfs_to_mysql
        type: command
        dependsOn:
         - dwt_to_ads
        config:
          command: /home/damoncai/bin/hdfs_to_mysql.sh all
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    68
  3. 将azkaban.project、gmall.flow文件压缩到一个zip文件,文件名称必须是英文。(gmall.zip)

  4. 在WebServer新建项目:http://ha01:8081/index

  5. gmall.zip文件上传

  6. 查看任务流

  7. 详细任务流展示

  8. 配置输入dt时间参数

  9. Mysql上查看数据

# Azkaban多Executor模式下注意事项

Azkaban多Executor模式是指,在集群中多个节点部署Executor。在这种模式下, Azkaban web Server会根据策略,选取其中一个Executor去执行任务。

由于我们需要交给Azkaban调度的脚本,以及脚本需要的Hive,Sqoop等应用只在hadoop102部署了,为保证任务顺利执行,我们须在以下两种方案任选其一,推荐使用方案二。

方案一:指定特定的Executor(ha01)去执行任务。

  1. 在MySQL中azkaban数据库executors表中,查询ha01上的Executor的id。

    mysql> use azkaban;
    Reading table information for completion of table and column names
    You can turn off this feature to get a quicker startup with -A
    
    Database changed
    mysql> select * from executors;
    +----+-----------+-------+--------+
    | id | host          | port  | active |
    +----+-----------+-------+--------+
    |  1   | hadoop103 | 35985 |      1 |
    |  2   | hadoop104 | 36363 |      1 |
    |  3   | hadoop102 | 12321 |      1 |
    +----+-----------+-------+--------+
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
  2. 在执行工作流程时加入useExecutor属性,如下

方案二:在Executor所在所有节点部署任务所需脚本和应用

  1. 分发脚本、sqoop、spark、my_env.sh

    xsync /home/atguigu/bin/
    xsync /opt/module/hive
    xsync /opt/module/sqoop
    xsync /opt/module/spark
    sudo /home/atguigu/bin/xsync /etc/profile.d/my_env.sh
    
    1
    2
    3
    4
    5
  2. 分发之后,在ha02,ha03重新加载环境变量配置文件,并重启Azkaban

Last Updated: 2/19/2022, 10:05:37 PM