Details
-
Bug
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
-
None
Description
Bourne shell has a toolkit of commands that allow you to do relational processing: grep = WHERE, sort = ORDER BY, uniq = SELECT DISTINCT, head = OFFSET/LIMIT, and so forth. Shell hackers are accustomed to writing pipelines. For example, to find the 3 largest files one would type
$ find . -type f -print0 |xargs -0 ls -l | sort -nr -k 5 | head -3 -rw-r--r-- 1 jhyde staff 416028 Jul 16 10:06 ./core/src/test/java/org/apache/calcite/test/SqlValidatorTest.java -rw-r--r-- 1 jhyde staff 325727 Jun 28 11:48 ./core/src/test/java/org/apache/calcite/test/JdbcTest.java -rw-r--r-- 1 jhyde staff 325106 Jul 16 10:06 ./core/src/test/resources/org/apache/calcite/test/RelOptRulesTest.xml
SQL is not quite as concise but nevertheless:
$ sqlsh select \* from files where not directory order by size desc limit 3
sqlsh is "SQL shell", a wrapper around Calcite JDBC. files is a view backed by a table function that calls "find .". Other commands:
- du table function - e.g. select path, size_k from du
- git_ls_files table function calls git-ls-files - e.g. select * from git_ls_files
- ps table function calls ps aux - e.g. select * from ps
- wc function calls wc - e.g. select path, lineCount from git_ls_files cross apply wc(path)
SQL would run in a lexical mode which is case-sensitive, and identifiers are not upper-cased if not quoted.
We could consider allowing shell-safe characters such as '-' in unquoted identifiers. (It's difficult to quote identifiers in SQL if the SQL has already passed through bash's quote handling, and had double-quotes and single-quotes removed.)
It gets really interesting when commands accept arguments, and Calcite pushes down filters to become those arguments. For example, in
sqlsh select distinct author from git_ls_files join git_commit_files using \(path\) join git_commits using \(commit\)
Calcite should rewrite to use git log.
When accessing files, some kinds of files have implicit names (and types) for fields, and some don't. You should be able to access fields by name or position; the following are equivalent:
sqlsh select gid from /etc/passwd where uid = 100 sqlsh select $4 from /etc/passwd where $3 = 100