#metabase #superset #nginx #postgres #graphana #neo4j

shiny proxy

use oidc information expression can be used to share a volume between users within the same team.
keycloak integration
sharing containers among users
custom html
can set the container user via --docker-user
docker images repository
container proxy repos
proxy.allow-transfer-app: transfer the app to other user ? also present in the modal form and in the tests
track-app-url to show the full url for given app
always-show-switch-instance show modal for given app

Debug mode:

logging:
  requestdump: true
  level:
    root: TRACE

→ sp server client call

we might fetch the user groups from the client and generate a form on the fly. we need a userInfo endpoint. Or we xan enrich the prepared map here.

→ Superset

Overall superset does not support natively base url, so it's a pain to integrate with SP.

A good solution might be considered, since it allows sharing public dashboards to someone not logged in, and we have two docker-compose starters:

plug superset on keycloak directly
define the base url to https://ldh-domain/superset
add a simple link in ldh

→ Metabase

supports base url
has CSP protection, so it does not work in iframe. The enterprise version can bypass the limitation, likely for a feature to embed dashboards in other applications.
still some button are missing (admin/settings) so there is some work to figure out why-> was because of iframe detection, leading to embedded mode.
there is no anonymous access. So the user will have to login two times. Too bad. -> use of api connection with lua
pre-configuration (creation of users, connections) might be done on the h2database with a python client?

→ disable CSP

It is very easy to build a custom metabase and removing that security
Leverage nginx reverse proxy to hide the CSP headers

The second option looks better:

no MT build needed and patch to maintain
general solution reusable for other tool to integrate
reuse the nginx to disable the login form, see below

One idea is to call the metabase login api and create a cookie, transfered by nginx.

the h2 database would be preinit with a admin/admin user
the entripoint would copy the db if not exist in the user mounted folder before starting MT
that user/pass would be used for the api call by the script

OpenResty is an nginx distribution which includes the LuaJIT interpreter for Lua scripts

FROM openresty/openresty:buster-fat
RUN opm install ledgetech/lua-resty-http thapakazi/lua-resty-cookie
COPY default.conf /etc/nginx/conf.d/
COPY *.lua /usr/local/openresty/nginx/
COPY nginx.conf /usr/local/openresty/nginx/conf/nginx.conf

server {
  listen 8080;
  server_name your.metabase.domain;

  location / {
    access_by_lua_file gen_token.lua;
    proxy_pass http://127.0.0.1:3000;
  }

}

local cjson = require("cjson")
local httpc = require("resty.http").new()
local ck = require("resty.cookie")

local cookie, err = ck:new()
if not cookie then
	ngx.log(ngx.ERR, err)
	return
end

local field, err = cookie:get("metabase.SESSION")
if not field then
	local res, err = httpc:request_uri("http://127.0.0.1:3000/api/session", {
		method = "POST",
		body = cjson.encode({
			username = os.getenv("METABASE_USERNAME"),
			password = os.getenv("METABASE_PASSWORD"),
		}),
		headers = {
			["Content-Type"] = "application/json",
		},
	})
	if not res then
		ngx.log(ngx.ERR, "request failed:", err)
		return
	end
	local data = cjson.decode(res.body)
	local ok, err = cookie:set({
		key = "metabase.SESSION",
		value = data["id"],
		path = "/",
		domain = ngx.var.host,
		httponly = true,
		-- max_age = 1209600,
		samesite = "Lax",
	})
	if not ok then
		ngx.log(ngx.ERR, err)
		return
	end
end

→ enable concurrent connections

Sounds like we could run multiple instances of MT having the same db. For example sharing the db in the team folder, so that team members share their dashboards.

h2 lilely supports // conn with h2:file:./data/testdb;AUTO_SERVER=TRUE
previously metabase was auto_server

→ resources management

resources access control

→ volume access

Goal:

In the user directory, files folders are rw across applications
In the team directory, files and folders are rw across applications and members of the team

When a volume is mounted in a container, if the folder does not yet exists, it is created with root user and overrides the folder bonded in the container. However of the folder exists at least in the docker image in the host, then it keeps its grant source and here he explicitly chown within the dockerfile -> this is not true with recent docker version
the uid and primary gid of the container user is used to create files and folders, unless setid/setgid are set, then the owner/group is kept
we could use sticky bit on other to let the container user change the folder user/group at init time (same behavior as /usr/bin/passwd command)
ideally all uid/gid should be the same across containers, but it is not possible (rstudio might use 101 while jupyter 102 and so on) While we can set the user from outside, the app might not work with it
using volume allows to set the uid/gid but only with tmpfs, cifs or NFS; not bind mount
it is possible to one liner to create and configure volumes
acl on the host won't apply within the container
if we were able to pre-create the folders on the host it would allow the user to write. Still the apps wouldn't be able to cross edit
this approach works, however it is not supported by shinyproxy.

FROM ubuntu:22.04
RUN mkdir -p '/foo' ; chown  '1001':'1001' '/foo'
# then
docker  build -t nico:latest .
docker run -it --rm  --user=1001:1001 --mount='source=volumeName,target=/foo,readonly=false' nico:latest ls -alrth /|grep foo
drwxr-xr-x   2 1001 1001 4.0K Sep 10 22:26 foo

also we could try to use docker rootless run by the 1000 user (which is used by Jupyter and rstudio)
eg to configure alt docker url

proxy.docker.url: URL and port on which to connect to the docker daemon, if not specified ShinyProxy tries to connect using the Unix socket of the Docker daemon
Parameter model code
Docker volume code
Mount volume
Where the parameters are used

→ LDH folder design

This allows both research projects on HDS infra and courses/misc projects to work with the same design.

Three level of groups:

Project: access to personal folder related to their projects plus a shared folder for all members
Project-Admin: same as above plus the project-personal folders of every members
Admin: same as above plus all personal and shared folders

The three structure can work that way:

LDH
- project1-personal
  - user1
  - user2
  - admin-project1
- project1-shared
- project2-personal
  - user2
- project2-shared

Admin would mount:

LDH:ro

Admin-project1 would mount:

project1/admin-project1 as project1/personal
project1-shared as project1/shared
project1-personal:ro as project1/users

User2 would mount:

project1/user2 as project1/personal
project1-shared as project1/shared
project2/user2 as project2/personal
project2-shared as project2/shared

Notes:

If a user has no project, it has no mount point.
Access to the apps could be require a a sandbox project
For admin, the access is read only: it avoids mistakes

The volume expression can work that way:

- #{listToCsv('./data/<repl>/' + userId + ':/root/<repl>/personal', projects}
- #{listToCsv('./data/<repl>-shared/:/root/<repl>/shared', projects}
# for admin project
- #{listToCsv('./data/<repl>/:/root/<repl>/users:ro', projects}
# for admin
- #{listToCsv('./data/:/root/projects:ro', projects}

→ grafana

We can provide a graphana instance per category: server metrics, logs, postgres metrics... and set a hme dashboard in anonymous mode.

depening on the user role, they would have access to more or less container info.

auth proxy

→ access

User acces

server ressources
user container resource usage
user container logs

Project admin access

server resources
project containers
project containers log

Admin access

Server resources
all containers
all containers logs

→ logs search

we can provide a log search dashboard

→ postgres metrics

graphana has a postgres connector so we could query pg at runtime to show metrics such as db size, rables and indexes, query per range time etc...
there is also the postgres metrics exporter to show the server load
citusdb also support pg_stat_statements + citus_stat_statements

→ Onlyoffice

→ Databases

→ Neo4j

→ Postgres

→ extensions

citus is a great candidate for a data analytic platform, since it can start with a single node and then scale-up easily
citusdb docker image
citus tutorial

→ access management

Needs:

one empty db per project
the admin setup a database from ldh
the user can read and write somewhere else
tools are preconfigured to access the db: cbeaver, metabase, Jupyter, rstudio...

Proposal:

one db per project
the project admin is owner and is rw in the shared schema
the project user is read only on the shared schema and is owner of a schema
the credentials for each project are in a .pgpass file in each user home
cbeaver is setup with each db

How to maintain the db:

user/groups are not known when they connect: db and users cannot be prepopulated
admin creds cannot be shared within the containers to create the resources
a dedicated docker service can listen for ldh containers and infere the : user/project/role from its name. Then it can create if not exists the dbs/users/credentials and put the latter in pgpass/db connections

→ mongodb

→ rocksdb

python rocksdb

→ tantivy

py tantivy

→ neo4j

→ Jupyter

There is official docker images
it uses conda ans install way too much crap
some stuff is interesting such the hwalthcheck
kaggle docker img

→ Custom image

Rootless so the user is root and can install whatever they need (including .deb)
we should only provide python kernel, with last stable version (3.12)
mainstream library will be preinstalled
onku jupyterlab ui will be avaika me, for sake of simplicity and feature documentation
we can document how to install new python version and create new kernel
pyenv will allow to install any version
virtualenv to allow custom kernel
overall, libraries will be installed within the virtualenv located into the user home folder, bound to host. By mean they are shared by project
exenple dockerfile pyenv
bash shell terminal
for pdf support

→ Accessing postgres

accessing the postgres instance will be done through .pgpass file maintained by the pg service, and the jupySQL lib pre installed
sqlalchemy likely consider .pgpass file
jupySQL provides a general way to store connections
the INI file way is preferred because it allow to list the existing connections and also choose between multiple connections easily
using ipythin-sql lib
https://medium.com/analytics-vidhya/postgresql-integration-with-jupyter-notebook-deb97579a38d
ipython-sql
ploomber
jupySQL has replaced ipython-sql

→ Extensions

→ vscode

shiny proxy example
--disable-getting-started-override
--disable-file-downloads
-disable-telemetry
-disable-update-check

→ rstudio

→ databases

So it is possible to have predefined connections, that can be navigated from the connection panel.

write this into /etc/rstudio/connections/Postgres\ parisni.R

library(connections)
library(RPostgres)
con <- connection_open(RPostgres::Postgres(), dbname = "postgres", host = "postgres", port = 5432, user = "parisni", password = "pwd")

→ airflow

→ Installation

we have two options:

run one webserver per user within SP and all the other services in compose. BTW use remote_user auth + enable iframe
run all services in compose, and provide a link within SP. BTW configure airflow with the user's auth (keycloak, ldap...)
same as 2. but starts an nginx in SP to redirect to the unique webserver. BTW uses a identity proxy to log the user

option 1. consumes more resources since there is a webserver per user, but the authent part is managed by SP. Starting the webserver is about 1min option 2. shares one webserver among all users, but the auth part is way more complicated to setup option 3. has all advantages

In all case we will need to register users within airflow

Ideas:

try to redirect to an airflow with no base_url. (No /airflow ?) -> it 302 to /home instead of /app_proxy/.../home
try proxy fix
apparently the cross origin pb comes from http vs https. Where does ths http comes from ? -> the location is http whike the referer is https
do the proxy w/o DP also had loc/referer broken for https ?

Proxy_redirect to replace with https plus some sub_filters did fix the UI.

→ Usage

React ?

This page was last modified: 2024-11-24 11:02