My Notes
Productivity, DevOps, Email, Kubernetes, Programming, Python, Mongo DB, macOS, REST, RDBMS, Powershell, SCM, Unix Tools,Notes on Productivity tools
Blog tools
I was very enthustic to know markdown level diagraming.
Mermaid
One of the best so far found is mermaid which I have used with the my blog tool stackedit.io. For example:
such a great diagraming.
asciinema
The tool asciinema record your terminal and upload to cloud. You can install this tool using brew
in the MacOS.
XML
Tools for XML
Python
To setup complete python environment, see the Python my workflow.
To fond the python directories in the PYTHONPATH
:
import sys
import pprint from pprint
pprint(sys.path)
Atom editor for Spark
First set the following path:
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.7-src.zip
I have used SDK MAN to install the spark home.
Now create virtual environment
pyenv global 2.7.18
virtualenv mypython
source bin/activate
python -m pip install --upgrade pip
In the virtual enviroment, install
pip install ipykernel
Then run the following, if above is not working.
python -m ipykernel install --user --name=env
You can open in Atom editor and do the inline debugging, if you install hydrogen in the editor.
If you want to use PySpark, first install
pip install pyspark
To find the installed pyspark version:
pip show pyspark
If you want, install the following packages to Atom editor:
- Script (to execute python from IDE, CMD+i)
- autocomplete-python
- flake 8 (to enable
pip install flake8
) - python-autopep8
Diff
Here the way to semantically diff the XML files: First create your project in Python virtual enviroment:
python3 -m venv xmltest
cd xmltest
source bin/activate
You project is xmltest
. Now install the graphtage packate
pip install graphtage
now you are ready to compare m1.xml and p1.xml files:
graphtage p1.xml m1.xml
This will give you a out put to CLI.
to deactivate, deactivate
in the CLI to move out from the project environment.
VSCode extensions for Python
Some of the extensions tested for Python:
- Use tools like flake8 and blue. flake8 reports on code styling, among many other issues, and blue rewrites source code according to (most) rules embedded in the black code formatting tool.
- isort: organise imports
- JSON Path Status Bar: Show JSON path of the element
- Output Colorizer: VSCode output in color
- Open Folder Context Menu for VS Code: This will open a new instance of VSCode for the selected folder in the Explorer.
- Pylint: Lint from Microsoft
Spark
I have configured Spark using SDKMAN.
docker run --name pyspark -e JUPYTER_ENABLE_LAB=yes -e JUPYTER_TOKEN="pyspark" -v "$(pwd)":/home/jovyan/work -p 8888:8888 jupyter/pyspark-notebook:d4cbf2f80a2a
Use the http://localhost:8888/?token=pyspark to open the jupyter notebook.
To run the Zeppelin:
docker run -u $(id -u) -p 8080:8080 -p 4040:4040 --rm -v $PWD/logs:/logs -v $PWD/:/notebook -e ZEPPELIN_LOG_DIR='/logs' -e ZEPPELIN_NOTEBOOK_DIR='/notebook' --name zeppelin apache/zeppelin:0.10.0
Command to create Apache Airflow
docker run -ti -p 8080:8080 -v ${PWD}/<dag>.py:/opt/airflow/dags/download_rocket_launches.py --name airflow --entrypoint=/bin/bash apache/airflow:2.0.0-python3.8 -c '( airflow db init && airflow users create --username admin --password admin --firstname Anonymous --lastname Admin --role Admin --email ojithak@gmail.com); airflow webserver & airflow scheduler'
Docker Databases containers
Postgres
Create docker image: (In the current directory, create a data
folder)
docker run -t -i \
--name Mastering-postgres \
--rm \
-p 5432:5432 \
-e POSTGRES_PASSWORD=ojitha \
-v "$(pwd)/data":/var/lib/postgresql/data \
postgres:13.4
Docker to access psql:
docker exec -it Mastering-postgres bash
Inside the bash run the following command to get into the psql
:
psql -h localhost -p 5432 -U postgres
MSSQL
Pull the image
docker pull mcr.microsoft.com/mssql/server:2019-latest
to run
docker run -e "ACCEPT_EULA=Y" -e "MSSQL_SA_PASSWORD=Pwd@2023" `
-p 1433:1433 --name sql1 --hostname sql1 `
-v C:\Users\ojitha\dev\mssql\data:/var/opt/mssql/data `
-v C:\Users\ojitha\dev\mssql\log:/var/opt/mssql/log `
-d `
mcr.microsoft.com/mssql/server:2019-latest
Download the sample database from the backup
Run the following fix before restore
docker container exec sql1 touch /var/opt/mssql/data/AdventureWorks2019.mdf
docker container exec sql1 touch /var/opt/mssql/log/AdventureWorks2019_log.ldf
Jekylle
To start Jekylle
bundle exec jekyll serve
Quarto
Quarto is based on the pandoc. Here the workflow to include Jupyter notebook in Jekyll site.
-
Frirst create Jupyter notebook in the vscode and include the yaml in the raw form.
--- title: PySpark Date Example format: html: code-fold: true jupyter: python3 ---
-
now copy the
ipynb
to temp directory -
now run the following command
quarto render pyspark_date_example.ipynb --to html
-
copy both of the generated folder and the html file to
<jekyll root>/_include
foler. -
remove the
<!DOCTYPE html>
first statement from the HTML page -
And add the post such as
--- layout: post title: PySpark Date Exmple date: 2022-03-02 categories: [Apache Spark] --- PySpark date in string to date type conversion example. How you can use python sql functions like `datediff` to calculate the differences in days. <!--more--> -- include pyspark_date_example.html using liquid --
As shown in the line# 12 embed the html file to post.
-
Now run the Jekyll if not started
CSVKit
Create a docker bases Postgres Docker container first. The default port is 5432. To import CSV file to postgres test database:
csvsql --db postgresql://postgres:ojitha@localhost/test --insert data.csv
NOTE: Table will be created as test.public.data.
You can query the table:
SELECT * FROM test.public.data
where "invoice_no" in (....)
order by "invoice_no";