# dataframe-go **Repository Path**: michael83/dataframe-go ## Basic Information - **Project Name**: dataframe-go - **Description**: fork from github - **Primary Language**: Go - **License**: Not specified - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2024-06-13 - **Last Updated**: 2024-12-02 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README

⭐   the project to show your appreciation. :arrow_upper_right:

dataframe-go

Dataframes are used for statistics, machine-learning, and data manipulation/exploration. You can think of a Dataframe as an excel spreadsheet. This package is designed to be light-weight and intuitive. ⚠️ The package is production ready but the API is not stable yet. Once Go 1.18 (Generics) is introduced, the **ENTIRE** package will be rewritten. For example, there will only be 1 generic Series type. After that, version `1.0.0` will be tagged. It is recommended your package manager locks to a commit id instead of the master branch directly. ⚠️ # Features 1. Importing from CSV, JSONL, Parquet, MySQL & PostgreSQL 2. Exporting to CSV, JSONL, Excel, Parquet, MySQL & PostgreSQL 3. Developer Friendly 4. Flexible - Create custom Series (custom data types) 5. Performant 6. Interoperability with [gonum package](https://godoc.org/gonum.org/v1/gonum). 7. [pandas sub-package](https://pandas.pydata.org/pandas-docs/stable/reference/frame.html) ![Help Required](https://img.shields.io/badge/help-required-blueviolet) 8. Fake data generation 9. Interpolation (ForwardFill, BackwardFill, Linear, Spline, Lagrange) 10. Time-series Forecasting (SES, Holt-Winters) 11. Math functions 12. Plotting (cross-platform) See [Tutorial](https://github.com/rocketlaunchr/dataframe-go#tutorial) here. ## Installation ``` go get -u github.com/rocketlaunchr/dataframe-go ``` ```go import dataframe "github.com/rocketlaunchr/dataframe-go" ``` # DataFrames ## Creating a DataFrame ```go s1 := dataframe.NewSeriesInt64("day", nil, 1, 2, 3, 4, 5, 6, 7, 8) s2 := dataframe.NewSeriesFloat64("sales", nil, 50.3, 23.4, 56.2, nil, nil, 84.2, 72, 89) df := dataframe.NewDataFrame(s1, s2) fmt.Print(df.Table()) OUTPUT: +-----+-------+---------+ | | DAY | SALES | +-----+-------+---------+ | 0: | 1 | 50.3 | | 1: | 2 | 23.4 | | 2: | 3 | 56.2 | | 3: | 4 | NaN | | 4: | 5 | NaN | | 5: | 6 | 84.2 | | 6: | 7 | 72 | | 7: | 8 | 89 | +-----+-------+---------+ | 8X2 | INT64 | FLOAT64 | +-----+-------+---------+ ``` [![Go Playground](https://img.shields.io/badge/Go-Playground-5593c7.svg?labelColor=41c3f3&style=for-the-badge)](https://play.golang.org/p/eC5HYAEHjNI) ## Insert and Remove Row ```go df.Append(nil, 9, 123.6) df.Append(nil, map[string]interface{}{ "day": 10, "sales": nil, }) df.Remove(0) OUTPUT: +-----+-------+---------+ | | DAY | SALES | +-----+-------+---------+ | 0: | 2 | 23.4 | | 1: | 3 | 56.2 | | 2: | 4 | NaN | | 3: | 5 | NaN | | 4: | 6 | 84.2 | | 5: | 7 | 72 | | 6: | 8 | 89 | | 7: | 9 | 123.6 | | 8: | 10 | NaN | +-----+-------+---------+ | 9X2 | INT64 | FLOAT64 | +-----+-------+---------+ ``` [![Go Playground](https://img.shields.io/badge/Go-Playground-5593c7.svg?labelColor=41c3f3&style=for-the-badge)](https://play.golang.org/p/xwW_410vQ2p) ## Update Row ```go df.UpdateRow(0, nil, map[string]interface{}{ "day": 3, "sales": 45, }) ``` ## Sorting ```go sks := []dataframe.SortKey{ {Key: "sales", Desc: true}, {Key: "day", Desc: true}, } df.Sort(ctx, sks) OUTPUT: +-----+-------+---------+ | | DAY | SALES | +-----+-------+---------+ | 0: | 9 | 123.6 | | 1: | 8 | 89 | | 2: | 6 | 84.2 | | 3: | 7 | 72 | | 4: | 3 | 56.2 | | 5: | 2 | 23.4 | | 6: | 10 | NaN | | 7: | 5 | NaN | | 8: | 4 | NaN | +-----+-------+---------+ | 9X2 | INT64 | FLOAT64 | +-----+-------+---------+ ``` [![Go Playground](https://img.shields.io/badge/Go-Playground-5593c7.svg?labelColor=41c3f3&style=for-the-badge)](https://play.golang.org/p/lsJkKw3ZUJq) ## Iterating You can change the step and starting row. It may be wise to lock the DataFrame before iterating. The returned value is a map containing the name of the series (`string`) and the index of the series (`int`) as keys. ```go iterator := df.ValuesIterator(dataframe.ValuesOptions{0, 1, true}) // Don't apply read lock because we are write locking from outside. df.Lock() for { row, vals, _ := iterator() if row == nil { break } fmt.Println(*row, vals) } df.Unlock() OUTPUT: 0 map[day:1 0:1 sales:50.3 1:50.3] 1 map[sales:23.4 1:23.4 day:2 0:2] 2 map[day:3 0:3 sales:56.2 1:56.2] 3 map[1: day:4 0:4 sales:] 4 map[day:5 0:5 sales: 1:] 5 map[sales:84.2 1:84.2 day:6 0:6] 6 map[day:7 0:7 sales:72 1:72] 7 map[day:8 0:8 sales:89 1:89] ``` [![Go Playground](https://img.shields.io/badge/Go-Playground-5593c7.svg?labelColor=41c3f3&style=for-the-badge)](https://play.golang.org/p/eqjvu-vO8sr) ## Statistics You can easily calculate statistics for a Series using the [gonum](https://godoc.org/gonum.org/v1/gonum/stat) or [montanaflynn/stats](https://godoc.org/github.com/montanaflynn/stats) package. `SeriesFloat64` and `SeriesTime` provide access to the exported `Values` field to seamlessly interoperate with external math-based packages. ### Example Some series provide easy conversion using the `ToSeriesFloat64` method. ```go import "gonum.org/v1/gonum/stat" s := dataframe.NewSeriesInt64("random", nil, 1, 2, 3, 4, 5, 6, 7, 8) sf, _ := s.ToSeriesFloat64(ctx) ``` ### Mean ```go mean := stat.Mean(sf.Values, nil) ``` ### Median ```go import "github.com/montanaflynn/stats" median, _ := stats.Median(sf.Values) ``` ### Standard Deviation ```go std := stat.StdDev(sf.Values, nil) ``` ## Plotting (cross-platform) ```go import ( chart "github.com/wcharczuk/go-chart" "github.com/rocketlaunchr/dataframe-go/plot" wc "github.com/rocketlaunchr/dataframe-go/plot/wcharczuk/go-chart" ) sales := dataframe.NewSeriesFloat64("sales", nil, 50.3, nil, 23.4, 56.2, 89, 32, 84.2, 72, 89) cs, _ := wc.S(ctx, sales, nil, nil) graph := chart.Chart{Series: []chart.Series{cs}} plt, _ := plot.Open("Monthly sales", 450, 300) graph.Render(chart.SVG, plt) plt.Display(plot.None) <-plt.Closed ``` Output:

plot

## Math Functions ```go import "github.com/rocketlaunchr/dataframe-go/math/funcs" res := 24 sx := dataframe.NewSeriesFloat64("x", nil, utils.Float64Seq(1, float64(res), 1)) sy := dataframe.NewSeriesFloat64("y", &dataframe.SeriesInit{Size: res}) df := dataframe.NewDataFrame(sx, sy) fn := funcs.RegFunc("sin(2*𝜋*x/24)") funcs.Evaluate(ctx, df, fn, 1) ``` [![Go Playground](https://img.shields.io/badge/Go-Playground-5593c7.svg?labelColor=41c3f3&style=for-the-badge)](https://play.golang.org/p/f4GfS2rUjaM) Output:

sine wave

## Importing Data The `imports` sub-package has support for importing csv, jsonl, parquet, and directly from a SQL database. The `DictateDataType` option can be set to specify the true underlying data type. Alternatively, `InferDataTypes` option can be set. ### CSV ```go csvStr := ` Country,Date,Age,Amount,Id "United States",2012-02-01,50,112.1,01234 "United States",2012-02-01,32,321.31,54320 "United Kingdom",2012-02-01,17,18.2,12345 "United States",2012-02-01,32,321.31,54320 "United Kingdom",2012-05-07,NA,18.2,12345 "United States",2012-02-01,32,321.31,54320 "United States",2012-02-01,32,321.31,54320 Spain,2012-02-01,66,555.42,00241 ` df, err := imports.LoadFromCSV(ctx, strings.NewReader(csvStr)) OUTPUT: +-----+----------------+------------+-------+---------+-------+ | | COUNTRY | DATE | AGE | AMOUNT | ID | +-----+----------------+------------+-------+---------+-------+ | 0: | United States | 2012-02-01 | 50 | 112.1 | 1234 | | 1: | United States | 2012-02-01 | 32 | 321.31 | 54320 | | 2: | United Kingdom | 2012-02-01 | 17 | 18.2 | 12345 | | 3: | United States | 2012-02-01 | 32 | 321.31 | 54320 | | 4: | United Kingdom | 2015-05-07 | NaN | 18.2 | 12345 | | 5: | United States | 2012-02-01 | 32 | 321.31 | 54320 | | 6: | United States | 2012-02-01 | 32 | 321.31 | 54320 | | 7: | Spain | 2012-02-01 | 66 | 555.42 | 241 | +-----+----------------+------------+-------+---------+-------+ | 8X5 | STRING | TIME | INT64 | FLOAT64 | INT64 | +-----+----------------+------------+-------+---------+-------+ ``` [![Go Playground](https://img.shields.io/badge/Go-Playground-5593c7.svg?labelColor=41c3f3&style=for-the-badge)](https://play.golang.org/p/7hyUXnRy1pR) ## Exporting Data The `exports` sub-package has support for exporting to csv, jsonl, parquet, Excel and directly to a SQL database. ## Optimizations * If you know the number of rows in advance, you can set the capacity of the underlying slice of a series using `SeriesInit{}`. This will preallocate memory and provide speed improvements. # Generic Series Out of the box, there is support for `string`, `time.Time`, `float64` and `int64`. Automatic support exists for `float32` and all types of integers. There is a convenience function provided for dealing with `bool`. There is also support for `complex128` inside the `xseries` subpackage. There may be times that you want to use your own custom data types. You can either implement your own `Series` type (more performant) or use the **Generic Series** (more convenient). ## civil.Date ```go import "time" import "cloud.google.com/go/civil" sg := dataframe.NewSeriesGeneric("date", civil.Date{}, nil, civil.Date{2018, time.May, 01}, civil.Date{2018, time.May, 02}, civil.Date{2018, time.May, 03}) s2 := dataframe.NewSeriesFloat64("sales", nil, 50.3, 23.4, 56.2) df := dataframe.NewDataFrame(sg, s2) OUTPUT: +-----+------------+---------+ | | DATE | SALES | +-----+------------+---------+ | 0: | 2018-05-01 | 50.3 | | 1: | 2018-05-02 | 23.4 | | 2: | 2018-05-03 | 56.2 | +-----+------------+---------+ | 3X2 | CIVIL DATE | FLOAT64 | +-----+------------+---------+ ``` # Tutorial ## Create some fake data Let's create a list of 8 "fake" employees with a name, title and base hourly wage rate. ```go import "golang.org/x/exp/rand" import "rocketlaunchr/dataframe-go/utils/faker" src := rand.NewSource(uint64(time.Now().UTC().UnixNano())) df := faker.NewDataFrame(8, src, faker.S("name", 0, "Name"), faker.S("title", 0.5, "JobTitle"), faker.S("base rate", 0, "Number", 15, 50)) ``` ```go +-----+----------------+----------------+-----------+ | | NAME | TITLE | BASE RATE | +-----+----------------+----------------+-----------+ | 0: | Cordia Jacobi | Consultant | 42 | | 1: | Nickolas Emard | NaN | 22 | | 2: | Hollis Dickens | Representative | 22 | | 3: | Stacy Dietrich | NaN | 43 | | 4: | Aleen Legros | Officer | 21 | | 5: | Adelia Metz | Architect | 18 | | 6: | Sunny Gerlach | NaN | 28 | | 7: | Austin Hackett | NaN | 39 | +-----+----------------+----------------+-----------+ | 8X3 | STRING | STRING | INT64 | +-----+----------------+----------------+-----------+ ``` ## Apply Function Let's give a promotion to everyone by doubling their salary. ```go s := df.Series[2] applyFn := dataframe.ApplySeriesFn(func(val interface{}, row, nRows int) interface{} { return 2 * val.(int64) }) dataframe.Apply(ctx, s, applyFn, dataframe.FilterOptions{InPlace: true}) ``` ```go +-----+----------------+----------------+-----------+ | | NAME | TITLE | BASE RATE | +-----+----------------+----------------+-----------+ | 0: | Cordia Jacobi | Consultant | 84 | | 1: | Nickolas Emard | NaN | 44 | | 2: | Hollis Dickens | Representative | 44 | | 3: | Stacy Dietrich | NaN | 86 | | 4: | Aleen Legros | Officer | 42 | | 5: | Adelia Metz | Architect | 36 | | 6: | Sunny Gerlach | NaN | 56 | | 7: | Austin Hackett | NaN | 78 | +-----+----------------+----------------+-----------+ | 8X3 | STRING | STRING | INT64 | +-----+----------------+----------------+-----------+ ``` ## Create a Time series Let's inform all employees separately on sequential days. ```go import "rocketlaunchr/dataframe-go/utils/utime" mts, _ := utime.NewSeriesTime(ctx, "meeting time", "1D", time.Now().UTC(), false, utime.NewSeriesTimeOptions{Size: &[]int{8}[0]}) df.AddSeries(mts, nil) ``` ```go +-----+----------------+----------------+-----------+--------------------------------+ | | NAME | TITLE | BASE RATE | MEETING TIME | +-----+----------------+----------------+-----------+--------------------------------+ | 0: | Cordia Jacobi | Consultant | 84 | 2020-02-02 23:13:53.015324 | | | | | | +0000 UTC | | 1: | Nickolas Emard | NaN | 44 | 2020-02-03 23:13:53.015324 | | | | | | +0000 UTC | | 2: | Hollis Dickens | Representative | 44 | 2020-02-04 23:13:53.015324 | | | | | | +0000 UTC | | 3: | Stacy Dietrich | NaN | 86 | 2020-02-05 23:13:53.015324 | | | | | | +0000 UTC | | 4: | Aleen Legros | Officer | 42 | 2020-02-06 23:13:53.015324 | | | | | | +0000 UTC | | 5: | Adelia Metz | Architect | 36 | 2020-02-07 23:13:53.015324 | | | | | | +0000 UTC | | 6: | Sunny Gerlach | NaN | 56 | 2020-02-08 23:13:53.015324 | | | | | | +0000 UTC | | 7: | Austin Hackett | NaN | 78 | 2020-02-09 23:13:53.015324 | | | | | | +0000 UTC | +-----+----------------+----------------+-----------+--------------------------------+ | 8X4 | STRING | STRING | INT64 | TIME | +-----+----------------+----------------+-----------+--------------------------------+ ``` ## Filtering Let's filter out our senior employees (they have titles) for no reason. ```go filterFn := dataframe.FilterDataFrameFn(func(vals map[interface{}]interface{}, row, nRows int) (dataframe.FilterAction, error) { if vals["title"] == nil { return dataframe.DROP, nil } return dataframe.KEEP, nil }) seniors, _ := dataframe.Filter(ctx, df, filterFn) ``` ```go +-----+----------------+----------------+-----------+--------------------------------+ | | NAME | TITLE | BASE RATE | MEETING TIME | +-----+----------------+----------------+-----------+--------------------------------+ | 0: | Cordia Jacobi | Consultant | 84 | 2020-02-02 23:13:53.015324 | | | | | | +0000 UTC | | 1: | Hollis Dickens | Representative | 44 | 2020-02-04 23:13:53.015324 | | | | | | +0000 UTC | | 2: | Aleen Legros | Officer | 42 | 2020-02-06 23:13:53.015324 | | | | | | +0000 UTC | | 3: | Adelia Metz | Architect | 36 | 2020-02-07 23:13:53.015324 | | | | | | +0000 UTC | +-----+----------------+----------------+-----------+--------------------------------+ | 4X4 | STRING | STRING | INT64 | TIME | +-----+----------------+----------------+-----------+--------------------------------+ ``` ## Other useful packages - [awesome-svelte](https://github.com/rocketlaunchr/awesome-svelte) - Resources for killing react - [dbq](https://github.com/rocketlaunchr/dbq) - Zero boilerplate database operations for Go - [electron-alert](https://github.com/rocketlaunchr/electron-alert) - SweetAlert2 for Electron Applications - [google-search](https://github.com/rocketlaunchr/google-search) - Scrape google search results - [igo](https://github.com/rocketlaunchr/igo) - A Go transpiler with cool new syntax such as fordefer (defer for for-loops) - [mysql-go](https://github.com/rocketlaunchr/mysql-go) - Properly cancel slow MySQL queries - [react](https://github.com/rocketlaunchr/react) - Build front end applications using Go - [remember-go](https://github.com/rocketlaunchr/remember-go) - Cache slow database queries - [testing-go](https://github.com/rocketlaunchr/testing-go) - Testing framework for unit testing # ### Legal Information The license is a modified MIT license. Refer to `LICENSE` file for more details. **© 2018-21 PJ Engineering and Business Solutions Pty. Ltd.**