<languages />
[[Category:Software]]

[https://arrow.apache.org/ Apache Arrow] is a cross-language development platform for in-memory data. It uses a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations. It also provides computational libraries and zero-copy streaming messaging and interprocess communication. Languages currently supported include C, C++, C#, Go, Java, JavaScript, MATLAB, Python, R, Ruby, and Rust.

== CUDA ==
Arrow is also available with CUDA.
{{Command|module load gcc arrow/X.Y.Z cuda}}
where X.Y.Z represent the desired version.

== Python bindings ==
The module contains bindings for multiple Python versions. 
To discover which are the compatible Python versions, run
{{Command|module spider arrow/X.Y.Z}}
where <tt>X.Y.Z</tt> represent the desired version.

Or search directly ''pyarrow'', by running
{{Command|module spider pyarrow}}

=== PyArrow ===
The Arrow Python bindings (also named ''PyArrow'') have first-class integration with NumPy, Pandas, and built-in Python objects. They are based on the C++ implementation of Arrow.

1. Load the required modules.
{{Command|module load gcc arrow/X.Y.Z python/3.11}}
where <tt>X.Y.Z</tt> represent the desired version.

2. Import PyArrow.
{{Command|python -c "import pyarrow"}}

If the command displays nothing, the import was successful.

For more information, see the [https://arrow.apache.org/docs/python/ Arrow Python] documentation.

==== Fulfilling other Python package dependency ====
Other Python packages depends on PyArrow in order to be installed.
With the <code>arrow</code> module loaded, your package dependency for <code>pyarrow</code> will be satisfied.
{{Command
|pip list {{!}} grep pyarrow
|result=
pyarrow    17.0.0
}}

==== Apache Parquet format ====
The [http://parquet.apache.org/ Parquet] file format is available. 

To import the Parquet module, execute the previous steps for <code>pyarrow</code>, then run
{{Command|python -c "import pyarrow.parquet"}}

If the command displays nothing, the import was successful.

== R bindings ==
The Arrow package exposes an interface to the Arrow C++ library to access many of its features in R. This includes support for analyzing large, multi-file datasets ([https://arrow.apache.org/docs/r/reference/open_dataset.html open_dataset()]), working with individual Parquet files ([https://arrow.apache.org/docs/r/reference/read_parquet.html read_parquet()], [https://arrow.apache.org/docs/r/reference/write_parquet.html write_parquet()]) and Feather files ([https://arrow.apache.org/docs/r/reference/read_feather.html read_feather()], [https://arrow.apache.org/docs/r/reference/write_feather.html write_feather()]), as well as lower-level access to the Arrow memory and messages.

=== Installation ===
1. Load the required modules.
{{Command|module load StdEnv/2020 gcc/9.3.0 arrow/8 r/4.1 boost/1.72.0}}

2. Specify the local installation directory.
{{Commands
|mkdir -p ~/.local/R/$EBVERSIONR/
|export R_LIBS{{=}}~/.local/R/$EBVERSIONR/
}}

3. Export the required variables to ensure you are using the system installation.
{{Commands
|export PKG_CONFIG_PATH{{=}}$EBROOTARROW/lib/pkgconfig
|export INCLUDE_DIR{{=}}$EBROOTARROW/include
|export LIB_DIR{{=}}$EBROOTARROW/lib
}}

4. Install the bindings.
{{Command|R -e 'install.packages("arrow", repos{{=}}"https://cloud.r-project.org/")'}}

=== Usage ===
After the bindings are installed, they have to be loaded.

1. Load the required modules.
{{Command|module load StdEnv/2020 gcc/9.3.0 arrow/8 r/4.1}}

2. Load the library.
{{Command
|R -e "library(arrow)"
|result=
> library("arrow")
Attaching package: ‘arrow’
}}

For more information, see the [https://arrow.apache.org/docs/r/index.html Arrow R documentation]