Connect to Hazelcast with Jupyter Notebooks
Jupyter Notebook is an open-source web tool that lets data scientists create and share documents with live code and equations. Because of its interactivity and data presentation features, the Jupyter Notebook is considered a powerful tool for scientific projects. It also lets you use a web browser to edit and run notebook documents.
On the other hand, Hazelcast IMDG, is a Java-based open source in-memory data grid that lets you use a wide selection of massively scalable data structures in your Python applications. It takes advantage of the Hazelcast Near Cache functionality to save frequently read data in your Python projects, allowing you to access data faster in your applications. Similarly, Hazelcast's IMDG Python client provides access to all data structures, distributed queues, and topics, as well as Javascript Object Notation (JSON) support.
In this tutorial, you'll be connecting to Hazelcast using Jupyter Notebooks with Python and SQL.
Prerequisites
Before getting started with this tutorial, ensure you have the following;
- Python 3.x installed on your computer.
- OpenJDK installed
- Hazelcast CLI installed
Installing Jupyter Notebook
To get started, we need to install and create a folder for this project. Open your terminal and run the command below:
mkdir hazelcast && cd hazelcast`
Then create and activate a virtual environment by running the following commands.
#install virtual env. pip install virtualenv #create virtual env. virtualenv env #activate env. source env/bin/activate
If you see a similar output to the one in the screenshot below, your virtual environment has been successfully activated.
Now install and run the Jupyter notebook with the command below:
# install pip install jupyter notebook # run jupyter-notebook
The above command will install, run, and open the Jupyter notebook at port 8888 in your browser.
Finally, you need to create a notebook and install the Hazelcast python client with Jupyter notebook. Click on the New tab at the top right corner of the Jupyter notebook on your browser, and select Python 3 (ipykernel)
You should have a new notebook opened on a new tab as shown on the screenshot below:
On the notebook cell copy and paste the code below and press Run to install the Hazelcast python client.
!pip install hazelcast-python-client[stats]
Creating a Hazelcast cluster
With the Jupyter notebook installed and set up, proceed to create a Hazelcast cluster for your application. To get started, you need to create a free tire account.
Once the sign-up is completed, you'll be redirected to the cluster page. Click the CREATE NEW CLUSTER button
Then you'll be redirected to the plans page where you'll be asked to choose your preferred plan. Go ahead and choose anyone that suits your project. But this tutorial uses the Basic Free plan.
Next, is the info page. Leave everything by default and hit the CREATE FREE CLUSTER button.
Lastly, you will be redirected to the Confirmation and Summary page. If are satisfied with the order depending on your use case, press the CONTINUE CREATING CLUSTER button.
At this point, your cluster should be created and ready to be used in your applications.
Connecting Cluster with Jupyter
Now to connect your Hazelcast cluster to your Python project, click on the CONNECT YOUR APPLICATION button.
Then select Python from the select field at the top right-hand side of the page. Hazelcast will then provide you with instructions on how to download the zip file. Carefully follow the instructions to download and run the file.
Once you have tested the Hazelcast client cluster, let's connect your cluster to your Python application. On your Jupyter notebook, import hazelcast, and the logging module.
import hazelcast import logging
Then import your application to your Hazelcast cluster with the code snippet below;
logging.basicConfig(level=logging.INFO) client = hazelcast.HazelcastClient( cluster_name="<NAME>", cloud_discovery_token="<TOKEN>", statistics_enabled=True, )
Replace <NAME>
and <TOKEN>
with your cluster name and discovery token. You can find them on the Python Client Quick Setup page in the Python Client Advanced Setup section.
Saving Data with SQL
To save data in your Hazelcast cluster using SQL, you need to create a mapping. To create city mapping for your cluster with the code snippet below.
def run_sql_mapping(client): print("Creating a mapping...") mapping_query = "CREATE OR REPLACE MAPPING cities TYPE IMap " \ "OPTIONS ('keyFormat'='varchar','valueFormat'='varchar')" client.sql.execute(mapping_query).result() print("The mapping has been created successfully.")
Once your mapping has been created, you'll see a similar output to the one on the screenshot below on the cell.
Next, update the run_sql_mapping function to insert a new record with the code snippet below;
print("Inserting data via SQL...") insert_query = """ INSERT INTO cities VALUES ('Australia','Canberra'), ('Croatia','Zagreb'), ('Czech Republic','Prague'), ('England','London'), ('Turkey','Ankara'), ('United States','Washington, DC'); """ client.sql.execute(insert_query).result() print("The data has been inserted successfully.") print("--------------------")
In the above code snippet, we are inserting city data into the cities map. If you run the application, you will see the output below on your terminal.
Lastly, update the run_sql_mapping function to retrieve the data you just created with the code snippet below.
print("Retrieving all the data via SQL...") result = hz_client.sql.execute("SELECT * FROM cities").result() for row in result: country = row[0] city = row[1] print("%s - %s" % (country, city)) print("--------------------")
You should also see the output below when you run the application.
To test it we need to call the run_sql_mapping and click the play button to run the application.
Saving Data to Memory
Hazelcast provides many distributed data structures for writing data to memory on your cluster. A distributed map is one of the most commonly used methods for writing data to memory, which is duplicated and dispersed across a cluster and is stored as key/value pairs in maps. To get started, run the Hazelcast local cluster, with the command below:
hz start
If you see the output below, then your Hazelcast local cluster has successfully started.
Then, open another terminal tab and start the console with the command below;
hz-cli console
You should see the output below on your console. Now add the code below on your Jupyter cell to import, connect to the Hazelcast local cluster and create a mapping with the code snippet below.
import hazelcast client = hazelcast.HazelcastClient() distributed_map = client.get_map("distributed-map")
Next, add some data to the cluster, fetch, and get the map size of the cluster with the code snippet below;
//add some data distributed_map.set("name", "John Doe").result() distributed_map.set("age", "21").result() //Get data get_name = distributed_map.get("name") get_age = distributed_map.get("age") get_name.add_done_callback(lambda future: print(future.result())) get_age.add_done_callback(lambda future: print(future.result())) print("Map size:", distributed_map.size().result()) # Shutdown the client. client.shutdown()
The **client.shoutdown ** will shut down the client once the operations have been performed. If you the application, you should see the output below.
Conclusion
Throughout this tutorial, you've learned how easy and flexible it is to your Hazelcast cluster using Jupyter notes. We started with the introduction of both tools and created a demo application for the demonstrations. Now that you've gotten the knowledge you seek, how would use Hazelcast in your next Python project? Feel free to learn more about Hazelcast from the official documentation.