kettle类型转换或者不同类型的比较
今天kettle“突然”不能正常工作了:源中的新数据没有新增到目标数据库。
想来想去,我曾经把目标表的user_id字段从decimal改为VARCHAR(10),所以kettle在识别关键字、进行比较时,出现了问题。
那么为何最开始没有出问题呢?——最开始时目标表是空的,所以一次性新增了过去。(标黄色,可能是这个原因)
那么就是数字和Null可以进行比较,而数字和字符无法进行比较?——比如目标表是空的,可以新增1001为'1001'。而再次比较1001和'1001'时则报错,转换就直接终止了:当源新增1002时,无法在目标中新增'1002'。
那么如何解决这个类型不一致的比较问题呢?想了想,这不是kettle应该解决的问题,SQL应该支持这种转换,在MySQL的文档中找到了转换的方法:
CAST(DEALER_ID AS CHAR(10)) AS DEALER_ID (奇怪的是为何使用CHAR这个名字而不是统一为VARCHAR——我也太多虑了)。
与其猜测某些原因,不如翻看日志(这让我找到了真正的解释),日志应该是第一时间想到的:
转换错误的日志:
2021/12/10 15:06:07 - Kitchen - Logging is at level : 基本日志
2021/12/10 15:06:07 - Kitchen - Start of run.
2021/12/10 15:06:13 - dms2fr_wechat - 开始执行任务
2021/12/10 15:06:13 - dms2fr_wechat - 开始项[get_var]
2021/12/10 15:06:13 - get_var - Using run configuration [Pentaho local]
2021/12/10 15:06:13 - get_var - Using legacy execution engine
2021/12/10 15:06:13 - get_var - 为了转换解除补丁开始 [get_var]
2021/12/10 15:06:13 - get_var.0 - Finished reading query, closing connection.
2021/12/10 15:06:13 - get_var.0 - 完成处理 (I=1, O=0, R=0, W=1, U=0, E=0)
2021/12/10 15:06:13 - var_sync_time.0 - Setting environment variables...
2021/12/10 15:06:13 - var_sync_time.0 - Set variable var_sync_time to value [2021/12/10 14:53:10.000000000]
2021/12/10 15:06:13 - var_sync_time.0 - Finished after 1 rows.
2021/12/10 15:06:13 - var_sync_time.0 - 完成处理 (I=0, O=0, R=1, W=1, U=0, E=0)
2021/12/10 15:06:13 - dms2fr_wechat - 开始项[update_sync_time]
2021/12/10 15:06:14 - dms2fr_wechat - 开始项[dms2fr]
2021/12/10 15:06:14 - dms2fr - Using run configuration [Pentaho local]
2021/12/10 15:06:14 - dms2fr - Using legacy execution engine
2021/12/10 15:06:14 - dms2fr_wechat - 为了转换解除补丁开始 [dms2fr_wechat]
2021/12/10 15:06:14 - dt_vs_finance_pay.0 - Finished reading query, closing connection.
2021/12/10 15:06:14 - dt_bb_dealers.0 - Finished reading query, closing connection.
2021/12/10 15:06:14 - dt_bb_dealers.0 - 完成处理 (I=625, O=0, R=0, W=625, U=0, E=0)
2021/12/10 15:06:14 - user.0 - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : Error in step, asking everyone to stop because of:
2021/12/10 15:06:14 - user.0 - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : org.pentaho.di.core.exception.KettleDatabaseException:
2021/12/10 15:06:14 - user.0 - Error inserting/updating row
2021/12/10 15:06:14 - user.0 - Data truncation: Truncated incorrect DOUBLE value: 'J211184'
2021/12/10 15:06:14 - user.0 -
2021/12/10 15:06:14 - user.0 - at org.pentaho.di.core.database.Database.insertRow(Database.java:1319)
2021/12/10 15:06:14 - user.0 - at org.pentaho.di.core.database.Database.insertRow(Database.java:1243)
2021/12/10 15:06:14 - user.0 - at org.pentaho.di.core.database.Database.insertRow(Database.java:1231)
2021/12/10 15:06:14 - user.0 - at org.pentaho.di.trans.steps.insertupdate.InsertUpdate.lookupValues(InsertUpdate.java:163)
2021/12/10 15:06:14 - user.0 - at org.pentaho.di.trans.steps.insertupdate.InsertUpdate.processRow(InsertUpdate.java:299)
2021/12/10 15:06:14 - user.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:62)
2021/12/10 15:06:14 - user.0 - at java.lang.Thread.run(Thread.java:748)
2021/12/10 15:06:14 - user.0 - Caused by: com.mysql.jdbc.MysqlDataTruncation: Data truncation: Truncated incorrect DOUBLE value: 'J211184'
2021/12/10 15:06:14 - user.0 - at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4185)
2021/12/10 15:06:14 - user.0 - at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4119)
2021/12/10 15:06:14 - user.0 - at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2570)
2021/12/10 15:06:14 - user.0 - at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2731)
2021/12/10 15:06:14 - user.0 - at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2799)
2021/12/10 15:06:14 - user.0 - at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2139)
2021/12/10 15:06:14 - user.0 - at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2442)
2021/12/10 15:06:14 - user.0 - at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2359)
2021/12/10 15:06:14 - user.0 - at com.mysql.jdbc.PreparedStatement.executeUpdate(PreparedStatement.java:2343)
2021/12/10 15:06:14 - user.0 - at org.pentaho.di.core.database.Database.insertRow(Database.java:1286)
2021/12/10 15:06:14 - user.0 - ... 6 more
2021/12/10 15:06:14 - dms2fr_wechat - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : 错误被检测到!
2021/12/10 15:06:14 - user.0 - 完成处理 (I=203, O=0, R=203, W=202, U=0, E=1)
2021/12/10 15:06:14 - Carte - Installing timer to purge stale objects after 1440 minutes.
2021/12/10 15:06:14 - dms2fr_wechat - 转换被检测
2021/12/10 15:06:14 - dms2fr_wechat - 转换正在杀死其他步骤!
2021/12/10 15:06:14 - dms2fr_wechat - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : 错误被检测到!
2021/12/10 15:06:14 - dms2fr_wechat - 完成作业项[dms2fr] (结果=[false])
2021/12/10 15:06:14 - dms2fr_wechat - 完成作业项[update_sync_time] (结果=[false])
2021/12/10 15:06:14 - dms2fr_wechat - 完成作业项[get_var] (结果=[false])
2021/12/10 15:06:14 - dms2fr_wechat - 任务执行完毕
2021/12/10 15:06:14 - Kitchen - Finished!
2021/12/10 15:06:14 - Kitchen - ERROR (version 8.2.0.0-342, build 8.2.0.0-342 from 2018-11-14 10.30.55 by buildguy) : Finished with errors
2021/12/10 15:06:14 - Kitchen - Start=2021/12/10 15:06:07.765, Stop=2021/12/10 15:06:14.868
2021/12/10 15:06:14 - Kitchen - Processing ended after 7 seconds.
改正后的日志:
2021/12/10 15:33:08 - Kitchen - Logging is at level : 基本日志
2021/12/10 15:33:08 - Kitchen - Start of run.
2021/12/10 15:33:13 - dms2fr_wechat - 开始执行任务
2021/12/10 15:33:13 - dms2fr_wechat - 开始项[get_var]
2021/12/10 15:33:14 - get_var - Using run configuration [Pentaho local]
2021/12/10 15:33:14 - get_var - Using legacy execution engine
2021/12/10 15:33:14 - get_var - 为了转换解除补丁开始 [get_var]
2021/12/10 15:33:14 - get_var.0 - Finished reading query, closing connection.
2021/12/10 15:33:14 - get_var.0 - 完成处理 (I=1, O=0, R=0, W=1, U=0, E=0)
2021/12/10 15:33:14 - var_sync_time.0 - Setting environment variables...
2021/12/10 15:33:14 - var_sync_time.0 - Set variable var_sync_time to value [2021/12/10 15:20:09.000000000]
2021/12/10 15:33:14 - var_sync_time.0 - Finished after 1 rows.
2021/12/10 15:33:14 - var_sync_time.0 - 完成处理 (I=0, O=0, R=1, W=1, U=0, E=0)
2021/12/10 15:33:14 - dms2fr_wechat - 开始项[update_sync_time]
2021/12/10 15:33:14 - dms2fr_wechat - 开始项[dms2fr]
2021/12/10 15:33:14 - dms2fr - Using run configuration [Pentaho local]
2021/12/10 15:33:14 - dms2fr - Using legacy execution engine
2021/12/10 15:33:14 - dms2fr_wechat - 为了转换解除补丁开始 [dms2fr_wechat]
2021/12/10 15:33:14 - dt_vs_finance_pay.0 - Finished reading query, closing connection.
2021/12/10 15:33:14 - dt_bb_dealers.0 - Finished reading query, closing connection.
2021/12/10 15:33:14 - dt_bb_dealers.0 - 完成处理 (I=625, O=0, R=0, W=625, U=0, E=0)
2021/12/10 15:33:15 - user.0 - 完成处理 (I=625, O=0, R=625, W=625, U=0, E=0)
2021/12/10 15:33:15 - Carte - Installing timer to purge stale objects after 1440 minutes.
2021/12/10 15:33:15 - dms2fr_wechat - 完成作业项[dms2fr] (结果=[true])
2021/12/10 15:33:15 - dms2fr_wechat - 完成作业项[update_sync_time] (结果=[true])
2021/12/10 15:33:15 - dms2fr_wechat - 完成作业项[get_var] (结果=[true])
2021/12/10 15:33:15 - dms2fr_wechat - 任务执行完毕
2021/12/10 15:33:15 - Kitchen - Finished!
2021/12/10 15:33:15 - Kitchen - Start=2021/12/10 15:33:08.119, Stop=2021/12/10 15:33:15.957
2021/12/10 15:33:15 - Kitchen - Processing ended after 7 seconds.
结论:kettle可以自动进行数字和字符串的比较,前提是这个字符串要能够转换为数字;但正确的做法是显式转换。(同样,日期和字符串的比较应该类似,但不常见。)
它是以源中的数据类型(decimal)为基础,将目标中的类型(CHAR)转换为源中的类型。
我更改类型本来就是基于兼容的考虑,要从其他渠道新增以字母开头的用户ID;现在kettle识别'J211184',将其转为数字时抛出了异常。