我的要求是
@H_502_40@
- Move data from Oracle to HDFS
- Process the data on HDFS
- Move processed data to Teradata.
还需要每15分钟进行一次整个处理.源数据的量可以接近50GB,并且处理的数据也可以是相同的.
经过互联网搜索后,我发现了
- ORAOOP to move data from Oracle to HDFS (Have the code withing the shell script and schedule it to run at the required interval).
- Do large scale processing either by Custom MapReduce or Hive or PIG.
- SQOOP – Teradata Connector to move data from HDFS to Teradata (again have a shell script with the code and then schedule it).
这是一个正确的选择,这是否适用于所需的时间段(请注意,这不是每日批次)?
我发现的其他选项如下
- STORM (for real time data processing). But i am not able to find the oracle Spout or Teradata bolt out of the Box.
- Any open source ETL tools like Talend or Pentaho.
请分享您对这些选项的想法以及任何其他可能性.