Query Engines
→ Starrocks
→ Ressources
- deploying starrocks
- okta integration
- hive statistics provider
- query cache
- decoupled storage
- Local cache
- spill to disk
- monitoring queries
- why starrocks outperforms spark
- ranger integration of doris
- Query cache
→ configs
set sql_dialect = 'trino'
seeset enable_pipeline_engine=true
for query cacheset enable_storage_cache=true
for decoupled storage local cacheSET enable_populate_block_cache=true
for block cacheSET enable_spill=true
also configure BEspill_local_storage_dir
→ hudi
→ external catalog
aws.s3.enable_ssl
→ metadata sync
→ code
- Hudi reader
- hudi table creates a thrift hudi table
- This returns the hudi files, based on timeline
- this works on the files
- there's likely option to use jni reader
- hudi fallbacks to hive statistics
So the FE get file listing for partitions at runtime. Depending on the table type, it merges logs files or not. It could also get the file listing from the hudi metada table.
→ caching
There is 3 mechanisms:
- Query cache: only used with native tables. It stores intermediate results, not the final ones
- Storage cache: only used with native tables stored on cloud storage. It stores the new data also locally
- Block cache: only used for external tables on cloud storage. It stores files locally, either disk or ram.
→ Athena
From doc: Use ORC for complex types Currently, when you query columns stored in Parquet that have complex data types (for example, array, map, or struct), Athena reads an entire row of data instead of selectively reading only the specified columns. This is a known issue in Athena. As a workaround, consider using ORC
→ jdbc
Details in the simbra jdbc manual
jdbc:awsathena://User=[AccessKey];Password=
[SecretKey];S3OutputLocation=[Output];[Property1]=[Value1];
[Property2]=[Value2];...
enableResultReuseByAgeenableResultReuseByAge
: 0/1 This property specifies whether the connector reuses the query results for the same type of query.maxResultReuseAgeInMinutesmaxResultReuseAgeInMinutes
: This property specifies whether the connector considers the age in minutes of previous query result for reuse. The range can be 0 to 10080 minutes.
→ dremio
→ pinot
This page was last modified: