Uploaded image for project: 'Apache IoTDB'
  1. Apache IoTDB
  2. IOTDB-842

Better Export/Import-CSV Tool

    XMLWordPrintableJSON

Details

    • 2021-09

    Description

      Hi, our import-csv tool is currently implemented by JDBC and requires a fossil format:

      e.g., 

      Time,root.sg.d1.s1,root.sg.d1.s2,root.sg.d2.s1,root.sg.d2.s2,root.sg.d2.s3
      2020-08-18T10:22:31.603+08:00,1,2.0,null,null,null
      2020-08-18T10:22:35.631+08:00,1,2.0,null,null,null
      2020-08-18T10:22:41.093+08:00,null,null,1,2.0,null
      2020-08-18T10:22:52.603+08:00,null,null,1,2.0,true
      

      Requirement 1:

      As we support 3 kinds of output format: align all series (by default), align by device, without alignment, it is better to support such 3 kinds of import-csv format:

      a.

      Time,root.sg.d1.s1,root.sg.d1.s2,root.sg.d2.s1,root.sg.d2.s2,root.sg.d2.s3
      2020-08-18T10:22:31.603+08:00,1,2.0,null,null,null
      2020-08-18T10:22:35.631+08:00,1,2.0,null,null,null
      2020-08-18T10:22:41.093+08:00,null,null,1,2.0,null
      2020-08-18T10:22:52.603+08:00,null,null,1,2.0,true
      

      b.

      Time,Device,s1,s2,s3
      2020-08-18T10:22:31.603+08:00,root.sg.d1,1,2.0,null
      2020-08-18T10:22:35.631+08:00,root.sg.d1,1,2.0,null
      2020-08-18T10:22:41.093+08:00,root.sg.d2,1,2.0,null
      2020-08-18T10:22:52.603+08:00,root.sg.d2,1,2.0,true
      

      c.
      (it is strange, I'd like to do not support such format.)

      Requment2:
      Different users may have different time formats for the first column.
      So, we'd better support different kinds of time format. e.g., let users define how to parse their timestamp: yyyy-MM-ddHH:mm:ss.SSS etc..

      Requirement 3:
      Support NULL as well as empty char to describe the null data point. For example, the following 3 lines are the same:

      2020-08-18T10:22:31.603+08:00,root.sg.d1,1,null,null

      2020-08-18T10:22:31.603+08:00,root.sg.d1,1,,

      2020-08-18T10:22:31.603+08:00,root.sg.d1,1, ,

      Requirement 4:

      Support claiming the storage group name once rather than repeat the storage group name for each line:

      e.g., for format b, we can tell the tool the sg is `root.sg` and then each row looks like:

      2020-08-18T10:22:35.631+08:00,d1,1,2.0,null

      Another option is add a new column called storage_group for each row.

      For UT:
      1. all data type should be covered;
      2. incorrect csv format should be covered;

      Attachments

        Issue Links

          Activity

            People

              xuanronaldo Xuan Ronaldo
              hxd Xiangdong Huang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: