07 Example - Describing a distribution#

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.graph_objects as go
import findspark; findspark.init()
from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
from pyspark.sql import functions as F
spark = SparkSession.builder.appName('statistics').master('local').getOrCreate()
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/home/runner/work/statistics/spark-3.1.3-bin-hadoop3.2/jars/spark-unsafe_2.12-3.1.3.jar) to constructor java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/07/21 02:34:35 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).

Example - Describing a distribution fig 1

dataset = 1 * [4] + 1 * [5] + 1 * [7] + 1 * [10] + 1 * [19] + 2 * [21] + 6 * [23] +  5 * [24] + 1 * [25]
bins = np.arange(0, 30)
max(dataset) - min(dataset)
21
np.median(dataset)
23.0
plt.hist(dataset, bins, color='#23b9cb')
plt.xlabel('Scores')
plt.show()
../_images/07 Example - Describing a distribution_7_0.png
sns.distplot(dataset, bins, color='#23b9cb')
plt.xlabel('Scores')
plt.show()
/opt/hostedtoolcache/Python/3.9.13/x64/lib/python3.9/site-packages/seaborn/distributions.py:2619: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
../_images/07 Example - Describing a distribution_8_1.png
data = [go.Histogram(x=dataset, nbinsx=30, marker_color='#23b9cb')]
fig = go.Figure(data=data)
fig.update_layout(
    xaxis_title='Scores'
)
fig.show()