Query Engines

→ Starrocks

→ Ressources

→ configs

set sql_dialect = 'trino' see
set enable_pipeline_engine=true for query cache
set enable_storage_cache=true for decoupled storage local cache
SET enable_populate_block_cache=true for block cache
SET enable_spill=true also configure BE spill_local_storage_dir

→ hudi

→ external catalog

aws.s3.enable_ssl

→ metadata sync

→ code

So the FE get file listing for partitions at runtime. Depending on the table type, it merges logs files or not. It could also get the file listing from the hudi metada table.

→ caching

There is 3 mechanisms:

Query cache: only used with native tables. It stores intermediate results, not the final ones
Storage cache: only used with native tables stored on cloud storage. It stores the new data also locally
Block cache: only used for external tables on cloud storage. It stores files locally, either disk or ram.

→ Athena

From doc: Use ORC for complex types Currently, when you query columns stored in Parquet that have complex data types (for example, array, map, or struct), Athena reads an entire row of data instead of selectively reading only the specified columns. This is a known issue in Athena. As a workaround, consider using ORC

→ jdbc

Details in the simbra jdbc manual

jdbc:awsathena://User=[AccessKey];Password= 
[SecretKey];S3OutputLocation=[Output];[Property1]=[Value1]; 
[Property2]=[Value2];...

enableResultReuseByAgeenableResultReuseByAge: 0/1 This property specifies whether the connector reuses the query results for the same type of query.
maxResultReuseAgeInMinutesmaxResultReuseAgeInMinutes : This property specifies whether the connector considers the age in minutes of previous query result for reuse. The range can be 0 to 10080 minutes.

→ dremio

Bench vs prestos

→ pinot

uber usage

React ?

This page was last modified: 2024-09-01 21:30