Description
Apache Doris
Apache Doris is a real-time analytical database based on MPP architecture. As a unified platform that supports multiple data processing scenarios, it ensures high performance for low-latency and high-throughput queries, allows for easy federated queries on data lakes, and supports various data ingestion methods.
Page: https://doris.apache.org
Github: https://github.com/apache/doris
Background
In Apache Doris, dictionary encoding is performed during data writing and compaction. Dictionary encoding will be implemented on string data types by default. The dictionary size of a column for one segment is 1M at most. The dictionary encoding technology accelerates strings during queries, converting them into INT, for example.
Task
- Phase One: Get familiar with the implementation of Apache Doris dictionary encoding; learning how Apache Doris dictionary encoding accelerates queries.
- Phase Two: Evaluate the effectiveness of full dictionary encoding and figure out how to optimize memory in such a case.
Learning Material
Page: https://doris.apache.org
Github: https://github.com/apache/doris
Mentor
- Mentor: Chen Zhang, Apache Doris Committer, zhangchen@apache.org
- Mentor: Zhijing Lu, Apache Doris Committer, luzhijing@apache.org
- Mailing List: dev@doris.apache.org