Details

    • Sub-task
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • None
    • SQL
    • None
    • Spark 1.5 doc/QA sprint

    Description

      Create a list of functions that is on this page but not in SQL/DataFrame.

      https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF

      Here's the list of missing stuff:

      basic

      between: added in 1.4
      bitwiseAND: added in 1.4
      bitwiseOR: added in 1.4
      bitwiseXOR: added in 1.4
      bitwiseNOT: added in 1.4
      

      math

      round(DOUBLE a)
      round(DOUBLE a, INT d) Returns a rounded to d decimal places.
      log2
      sqrt(string column name)
      bin
      hex(long), hex(string), hex(binary)
      unhex(string) -> binary
      conv
      pmod
      factorial
      -toDeg  -> toDegrees-: added in 1.4
      -toRad -> toRadians-: added in 1.4
      e()
      pi()
      shiftleft(int or long)
      shiftright(int or long)
      shiftrightunsigned(int or long)
      

      collection functions

      sort_array(array)
      size(map, array)
      map_values(map<k,v>): array<v>
      map_keys(map<k,v>):array<k>
      array_contains(array<t>, value): boolean
      

      date functions

      from_unixtime(long, string): string
      unix_timestamp(): long
      unix_timestamp(date): long
      year(date): int
      month(date): int
      day(date): int
      dayofmonth(date); int
      hour(timestamp): int
      minute(timestamp): int
      second(timestamp): int
      weekofyear(date): int
      date_add(date, int)
      date_sub(date, int)
      from_utc_timestamp(timestamp, string timezone): timestamp
      current_date(): date
      current_timestamp(): timestamp
      add_months(string start_date, int num_months): string
      last_day(string date): string
      next_day(string start_date, string day_of_week): string
      trunc(string date[, string format]): string
      months_between(date1, date2): double
      date_format(date/timestamp/string ts, string fmt): String
      

      conditional functions

      if(boolean testCondition, T valueTrue, T valueFalseOrNull): T
      nvl(T value, T default_value): T
      greatest(T v1, T v2, …): T
      least(T v1, T v2, …): T
      

      string functions

      ascii(string str): int
      base64(binary): string
      concat(string|binary A, string|binary B…): string | binary
      concat_ws(string SEP, string A, string B…): string
      concat_ws(string SEP, array<string>): string
      decode(binary bin, string charset): string
      encode(string src, string charset): binary
      find_in_set(string str, string strList): int
      format_number(number x, int d): string
      length(string): int
      instr(string str, string substr): int
      locate(string substr, string str[, int pos]): int
      lower(string), lcase(string)
      lpad(string str, int len, string pad): string
      ltrim(string): string
      
      parse_url(string urlString, string partToExtract [, string keyToExtract]): string
      printf(String format, Obj... args): string
      regexp_extract(string subject, string pattern, int index): string
      regexp_replace(string INITIAL_STRING, string PATTERN, string REPLACEMENT): string
      repeat(string str, int n): string
      reverse(string A): string
      rpad(string str, int len, string pad): string
      space(int n): string
      split(string str, string pat): array
      str_to_map(text[, delimiter1, delimiter2]): map<string, string>
      trim(string A): string
      unbase64(string str): binary
      upper(string A) ucase(string A): string
      levenshtein(string A, string B: int
      soundex(string A): string
      

      Misc

      hash(a1[, a2…]): int
      

      text

      context_ngrams(array<array<string>>, array<string>, int K, int pf): array<struct<string,double>>
      ngrams(array<array<string>>, int N, int K, int pf): array<struct<string,double>>
      sentences(string str, string lang, string locale): array<array<string>>
      

      UDAF

      var_samp
      stddev_pop
      stddev_samp
      covar_pop
      covar_samp
      corr
      percentile: array<double>
      percentile_approx: array<double>
      histogram_numeric: array<struct {'x','y'}>
      collect_set  <— we have hashset
      collect_list 
      ntile
      

      Attachments

        Issue Links

          Activity

            People

              rxin Reynold Xin
              rxin Reynold Xin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: