Date:
Estimated Time:2 minutes
FHIR Search Engine - Resource Operators
Technical Consideration
- solr de-normalization faster selection query performances and expressivity
- solr de-normalization slower update perforamces and add complexity
- solr-join can be used in solr-cloud - the joined collection SHALL be replicated over every shard.
- solr-join performances are linear among the size of the joined collection (local parameter: score=none).
- solr-join query time is up to 1 sec on a 1M sized collection.
- solr-join only implements inner joins without access to the joined collection fields
- spark-solr provides fast access to solr-cloud fields with docValues=true or type string, long, timestamp
- spark-solr produces a dataframe from a solr-cloud collection thought the /export streaming expression handler
- spark-solr supports any query filters to subset the collection
- spark-solr offers a large set of joins, intersect, union, aggregation and windowing
- fhir _query search, allows top down search - Observation from patient with identifier
- fhir _has allows bottom-up search - Patient having Observation with coding
- fhir _graphql does not allow temporal operators (?)
Technical Conclusion
- solr-join are not compatible with large tables
- top-down-search can be implemented with solr de-normalized / solr-join
- top-down-search prefers solr-denormalized since the joined collection are huge (patient/encounter)
- the de-normalization can cover patient/encounter details - spark can fetch them from the patient/encounter delta files (including the _lastUpdated)
- the _has covers and operator but lacks or, exept, after, before, not, count and operator scope ( and )
- the _has could be implemented by creating a dedicated collection with all resources in it.
- the _has could be implemented with spark-solr with and operators only (patients semi-left joins)
- adding a long patient field to the fhir patient resource will be usefull
- faceting, and unique faceting fields and denormalized fields (extension in the bundle)
- [Base]/Condition?patient.identifier=xxx&encounter.lenght=10
Implementation
It is possible from spring-data-solr to build the solr query. This query can be passed to spark-solr.
fhir-syntax -> solr-syntax
- (Condition?active=false) AFTER(24, recorded-date, created-date) (Composition?status=canceled&_text=patient&type:below=letter)
- ConditionAphp(q=*:&active=true) AFTER(encounter, 24, recorded-date, created-date) CompositionAphp(q=:*&status=canceled)
- ConditionAphp(q=*:&active=true) AND(encounter) CompositionAphp(q=:*&status=canceled)
- AFTER(AND(Condition(query), Composition(query), encounter), Medication(query), patient, "start-date", "end-date")
AND(Dataframe, Dataframe, Column) OR(Dataframe, Dataframe, Column) EXCEPT(Dataframe, Dataframe) AFTER(Dataframe, Dataframe, column, column, column) OCCURRENCE(Dataframe, number) NOT_IN(Dataframe, Dataframe)
import org.springframework.data.solr.core.DefaultQueryParser
= new DefaultQueryParser(new SimpleSolrMappingContext());
DefaultQueryParser b String z = b.createQueryStringFromNode(search.getCriteria(), null);
Things go like this:
fhir-client: fhir-syntax-operators -> fhir-backend: fhir-syntax-operator => solr-syntax-operator -> spark-backend: parser,joins, results -> create a patient list within solr
Group versus List
- it is possible to filter resources by Group thanks to reverse chaining
- [Base]/Patient?_has:Group:member:_id=xxxx
- [Base]/Encounter?patient._has:Group:member:_id=xxxx
- [Base]/Observation?patient._has:Group:member:_id=xxxx
- [Base]/Composition?patient._has:Group:member:_id=xxxx
- it is possible to filter resources by List thanks to _list parameter:
- [Base]/Patient?_list=xxxx where _list is a list of patients
- [Base]/Encounter?_list=xxxx where _list is a list of encounters
Results Informations
- the resulting json should contain:
- the results on the whole
- the results on the practitioner ward
- the faceting results
Most relevant FHIR ressources have two, three or four of those elements:
- id
- patient
- encounter
- date
Patient Level
Every filtered ressource being linked to a patient it is possible to operate on them all:
val r0 = (1::2::3::6::7::Nil).toDF("patient")
val r1 = (1::2::3::Nil).toDF("patient")
val r2 = (2::3::6::Nil).toDF("patient")
val r3 = (1::3::7::Nil).toDF("patient")
- `r1 AND r2 AND NOT r3`: r1.intersect(r1).join(r3,"patient"::Nil,"leftanti").dropDuplicates.show
- `r1 OR r2 OR NOT r3`: r1.union(r2).union(r0.join(r3,"patient"::Nil,"leftanti")).dropDuplicates.show
- `(r1 OR r2) AND r3`: r3.intersect(r1.union(r2)).dropDuplicates.show
Encounter Level
val r1 = ((1,10)::(1,11)::(3,12)::Nil).toDF("patient", "encounter")
val r2 = ((1,11)::(3,12)::Nil).toDF("patient", "encounter")
val r3 = (1::3::7::Nil).toDF("patient")
- `r1 SAME_ENC r2`: r1.join(r2, "encounter"::Nil, "leftsemi").select("patient")
- `r1 SAME_ENC NOT r2`: r1.join(r2, "encounter"::Nil, "leftanti").select("patient")
- `r1 SAME_ENC r2 OR_PAT r3`: r1.join(r2, "encounter"::Nil, "leftsemi").select("patient").union(r3)
Date Level
val r1 = ((1,new Date(1L))::(1,new Date(2L))::Nil).toDF("patient", "date")
val r2 = ((1,new Date(1L))::(3,new Date(2L))::Nil).toDF("patient", "date")
- `r1 AFTER r2`: r1.join(r2, "encounter"::Nil, "leftsemi").select("patient")
- `r1 AFTER_24h r2`: r1.as("r1").join(r2.as("r2"), "patient"::Nil, "left").filter(col("r1.date").>=(expr("r2.date + INTERVAL 24 HOURS")))
- `r1 NOT AFTER_24h r2`: r1.as("r1").join(r2.as("r2"), "patient"::Nil, "full").filter(col("r1.date").isNull || !col("r1.date").>=(col("r2.date")))
All Together
Number Of Occurrence
This page was last modified: