Documentation

VoroClust is available to Sandians on request. The best way receive permission to access the repository is to reach out to the team via the About / Contact Us page on this website. VoroClust currently has a government-use license, so external government partners can also be given access.

Getting VoroClust

You will need to get a copy of VoroClust. You can clone the Git repo like so:

with ssh:

$ git clone git@cee-gitlab.sandia.gov:vorocrust/voroclust.git

Or with https:

$ git clone https://cee-gitlab.sandia.gov/vorocrust/voroclust.git

Once you have the VoroClust software, the folder structure will be as follows:

Image of voroclust_folders.png

Using VoroClust

There are 2 ways that users can implement the VoroClust software. Users can build the C++ code and produce an executable that they can then run with a config file; or users can install VoroClust as a Python package that they can then import into their Python scripts.

Method 1 – Using with Python

Installation Instructions

The easiest way to use voroclust is with the python package. The package can be built from the repository be moving into the voroclust/python directory and executing the following command:

python -m pip install .

With python 3.10+, this should handle all dependencies automatically. Further troubleshooting information can be found in the repository README.

Usage Instructions

Once the package is installed, voroclust can be imported and used in any python script.

from voroclust import voroclust
import numpy as np
import matplotlib.pyplot as plt

###print full documentation
help(voroclust)

data = np.loadtxt("path/to/data.csv", delimiter=',')
size = data.shape[0]
dimensions = data.shape[1]

vc = voroclust(data.flatten(),
               data_size=size,
               data_dimensions=dimensions,
               radius=.35,
               detail_ceiling=.8,
               descent_limit=.2,
               num_threads=12,
               data_tree_filename="")

###Optional: skips sphere generation, to allow faster tuning of secondary parameters
#vc.loadSpheres("spheres.bin")

vc.execute()

###Choice of post-processing steps
vc.labelByMaxClusters(4)
#vc.labelNoise(.05)

###Optional: write some data to files which can be loaded on subsequent runs to save time
#vc.writeDataTree("tree.bin")
#vc.writeSpheres("spheres.bin")

data_labels = vc.getLabels()

###plotting results if it's just a 2D dataset
plt.scatter(data[:,0], data[:,1], c=data_labels)
plt.show()

###for higher dimensional data, results can be plotted by reshaping
###and using labels to color pixels of an image
# pixel_rows=628
# pixel_columns=557
# labels_reshaped = np.reshape(data_labels, (pixel_rows, pixel_columns))  
# plt.imshow(labels_reshaped, interpolation="none")
# plt.show()

Method 2 – Using as Executable

Build

As an alternative to Python, you can build and run an executable directly. Start by creating a build directory at the root level of the VoroClust project (voroclust/build), and using cmake to build:

cd voroclust
mkdir build
cd build
cmake ..

On linux or MacOS, you can finish the build with:

make

On Windows with visual studio, cmake will have generated a VoroClust.sln file in the build folder. You can finish the build by opening this in Visual Studio, and selecting Build –> Build Solution in the top bar.

You should now have voroclust.exe in your build folder, or child folders depending on settings.

Execute

Once you have the executable, you need 2 things, a data file and configure file. The data file will need to be a .csv file and the configure file will need to end with extension .in.

VoroClust expects a csv datafile where each row is an n-dimensional data point. The data file will look similar to this one:

Image of datafileexample.png

And the configure file should look similar to this. The format is just a simple set of key-value pairs.

DATA_FILE=path/to/data.csv
OUTPUT_FOLDER=path/to/output/folder/
RADIUS=.115
NOISE_THRESHOLD=.05
DETAIL_CEILING=.9
DESCENT_LIMIT=.25
NUM_THREADS=-1
WRITE_SPHERE_FILE=path/to/spheres.bin
#READ_SPHERE_FILE=path/to/spheres.bin

The parameters include-able in the configure file are:

  • DATA_FILE – This parameter specifies which data file you want to use
  • OUTPUT_FOLDER – This allows user to choose where the output is written to
  • RADIUS – The radius of spheres used to cover the domain
  • NOISE_THRESHOLD – Determines the fraction of data points which will be labeled as noise in a post processing step
  • DETAIL_CEILING – Value between 0 and 1. Controls clustering propagation.
  • DESCENT_LIMIT – Value between 0 and 1. Controls clustering propagation. DETAIL_CEILING should be greater than DESCENT_LIMIT.
  • FIXED_SEED – Set a fixed seed. Defaults to -1 (random operation)
  • NUM_THREADS – Number of OpenMP threads to use. Defaults to 1.
  • READ_DATA_TREE_FILE – To save time, we can load the data’s Kd-Tree from a .bin file, rather than recomputing it.
  • WRITE_DATA_TREE_FILE – Write the Kd-Tree to a .bin file for future use.
  • READ_SPHERE_FILE – To save time, we can load the sphere cover from a .bin file, rather than recomputing it.
  • WRITE_SPHERE_FILE – Write the sphere cover to a .bin file for future use.
  • WRITE_DATA_BIN_FILE – Given a .csv DATA_FILE, writes the data to a .bin for reduced storage and faster loading. If this parameter is present, will skip clustering and ONLY write the file.

Using the Executable

Once you have built the executable and you have the config and data file, the command line usage is as follows:

./voroclust config.in

Running this command will produce a file titled something along the lines of data_labels_0.100000_0.850000_0.150000.csv, the significance of the file name being the parameters used during the execution.

Data Format

The VoroClust executable expects a .csv data file as an input. While this is simple and cross-platform, it is not efficient for large datasets. Using the Python package grants significantly more flexibility, as any format that can be loaded and processed in Python can be used. If you can’t use the Python package, there is a configuration parameter WRITE_DATA_BIN_FILE. If it is included in the configuration along with an input .csv and output folder, voroclust.exe will parse the input .csv and write the data out to a much more compact .bin file that can be used in future runs.

Additional Help

If you need any additional help with using VoroClust please do not hesitate to reach out to us.