2021年2月4日星期四

How to read parquet files in C#

Could anyone please let me know how to read parquet files in C#. I have tried using parquet.net. It works fine when generating parquet files, but getting the below issue when reading the parquet. However, this file is generated using the same code mentioned in https://github.com/elastacloud/parquet-dotnet. I have validated the parquet file and its valid as well.

"message": "not a Parquet file(head is '')",  

Below is the code that I have used to read

using System.Collections.Generic;  using Parquet;  using Parquet.Data;  using System.Linq;  using System.Text;  using System.IO;      namespace ReadParquet  {      class Program      {          static void Main(string[] args)          {              // open file stream              using (Stream fileStream = System.IO.File.OpenRead("C:\\Users\\snelaturu\\work\\test.parquet"))              {                  // open parquet file reader                  using (var parquetReader = new ParquetReader(fileStream))                  {                      // get file schema (available straight after opening parquet reader)                      // however, get only data fields as only they contain data values                      DataField[] dataFields = parquetReader.Schema.GetDataFields();                        // enumerate through row groups in this file                      for (int i = 0; i < parquetReader.RowGroupCount; i++)                      {                          // create row group reader                          using (ParquetRowGroupReader groupReader = parquetReader.OpenRowGroupReader(i))                          {                              // read all columns inside each row group (you have an option to read only                              // required columns if you need to.                              DataColumn[] columns = dataFields.Select(groupReader.ReadColumn).ToArray();                                // get first column, for instance                              DataColumn firstColumn = columns[0];                                // .Data member contains a typed array of column data you can cast to the type of the column                              Array data = firstColumn.Data;                              int[] ids = (int[])data;                          }                      }                  }              }          }      }  }  
https://stackoverflow.com/questions/66057229/how-to-read-parquet-files-in-c-sharp February 05, 2021 at 11:37AM

没有评论:

发表评论