Jump to content

Tidyverse

From Wikipedia, the free encyclopedia
Tidyverse
Repositorygithub.com/tidyverse/tidyverse
Written inR
TypePackage collection
LicenseMIT
Websitewww.tidyverse.org Edit this at Wikidata

The tidyverse is a collection of open source packages for the R programming language introduced by Hadley Wickham[1] and his team that "share an underlying design philosophy, grammar, and data structures" of tidy data.[2] Characteristic features of tidyverse packages include extensive use of non-standard evaluation and encouraging piping.[3][4][5]

As of November 2018, the tidyverse package and some of its individual packages comprise 5 out of the top 10 most downloaded R packages.[6] The tidyverse is the subject of multiple books and papers.[7][8][9][10] In 2019, the ecosystem has been published in the Journal of Open Source Software.[11]

Its syntax has been referred to as "supremely readable",[12] and some[13] have argued that tidyverse is an effective way to introduce complete beginners to programming, as pedagogically it allows students to quickly begin doing data processing tasks.[14][13] Moreover, some practitioners have pointed out that data processing tasks are intuitively easier to chain together with tidyverse compared to Python's equivalent data processing package, pandas.[15] There is also an active R community around the tidyverse. For example, there is the TidyTuesday social data project organised by the Data Science Learning Community (DSLC),[16] where varied real-world datasets are released each week for the community to participate, share, practice, and make learning to work with data easier.[17] Critics of the tidyverse have argued it promotes tools that are harder to teach and learn than their built-in, base R equivalents and are too dissimilar to some programming languages.[18][19]

The tidyverse principles more generally encourage and help ensure that a universe of streamlined packages, in principle, will help alleviate dependency issues and compatibility with current and future features.[20] An example of such a tidyverse principled approach is the pharmaverse, which is a collection of R packages for clinical reporting usage in pharma.[21]

Packages

[edit]

The core tidyverse packages, which provide functionality to model, transform, and visualize data, include:[22]

  • ggplot2 – for data visualization
  • dplyr – for wrangling and transforming data
  • tidyr help transform data specifically into tidy data, where each variable is a column, each observation is a row; each row is an observation, and each value is a cell.
  • readr help read in common delimited, text files with data
  • purrr a functional programming toolkit
  • tibble a modern implementation of the built-in data frame data structure
  • stringr helps to manipulate string data types
  • forcats helps to manipulate category data types

Additional packages assist the core collection.[23] Other packages based on the tidy data principles are regularly developed, such as tidytext[24] for text analysis, tidymodels[25] for machine learning, or tidyquant[26] for financial operations.

References

[edit]
  1. ^ "Welcome to the Tidyverse". Revolutions. Retrieved 2018-11-26.
  2. ^ "Tidyverse". www.tidyverse.org. Retrieved 2018-11-26.
  3. ^ Wickham, Stefan Milton Bache and Hadley (2014-11-22), magrittr: A Forward-Pipe Operator for R, retrieved 2020-04-20
  4. ^ Wickham, Hadley. 4 Pipes | The tidyverse style guide.
  5. ^ Wickham, Hadley (May 30, 2019). Advanced R (2nd ed.). New York: Chapman & Hall. ISBN 978-0815384571.{{cite book}}: CS1 maint: date and year (link)
  6. ^ "RDocumentation". www.rdocumentation.org. Retrieved 2018-11-26.
  7. ^ Duggan, Jim (2018-09-07). "Input and output data analysis for system dynamics modelling using the tidyverse libraries of R". System Dynamics Review. 34 (3): 438–461. doi:10.1002/sdr.1600. hdl:10379/15029. ISSN 0883-7066. S2CID 70005357.
  8. ^ Chang, Winston (2013). R Graphics Cookbook. "O'Reilly Media, Inc.". ISBN 9781449316952.
  9. ^ C., Boehmke, Bradley (2016-11-17). Data wrangling with R. Cham. ISBN 9783319455990. OCLC 964404346.{{cite book}}: CS1 maint: location missing publisher (link) CS1 maint: multiple names: authors list (link)
  10. ^ Hadley, Wickham (2017). R for data science : import, tidy, transform, visualize, and model data. Grolemund, Garrett (First ed.). Sebastopol, CA. ISBN 9781491910399. OCLC 968213225.{{cite book}}: CS1 maint: location missing publisher (link)
  11. ^ Wickham, Hadley; Averick, Mara; Bryan, Jennifer; Chang, Winston; McGowan, Lucy D'Agostino; François, Romain; Grolemund, Garrett; Hayes, Alex; Henry, Lionel; Hester, Jim; Kuhn, Max; Pedersen, Thomas Lin; Miller, Evan; Bache, Stephan Milton; Müller, Kirill; Ooms, Jeroen; Robinson, David; Seidel, Dana Paige; Spinu, Vitalie; Takahashi, Kohske; Vaughan, Davis; Wilke, Claus; Woo, Kara; Yutani, Hiroaki (21 November 2019). "Welcome to the Tidyverse". Journal of Open Source Software. 4 (43): 1686. Bibcode:2019JOSS....4.1686W. doi:10.21105/joss.01686. S2CID 214002773.
  12. ^ Steinmetz, Art (2024-04-10). "Outsider Data Science - The Truth About Tidy Wrappers". outsiderdata.netlify.app. Retrieved 2024-04-11.
  13. ^ a b Heppler, Jason (2018-02-27). "Teaching the tidyverse to R novices". Medium. Retrieved 2023-08-24.
  14. ^ on, Teach the tidyverse to beginners was published (5 July 2017). "Teach the tidyverse to beginners". Variance Explained. Retrieved 2022-07-15.
  15. ^ "Why pandas feels clunky when coming from R". Rasmus Bååth's Blog. Retrieved 2024-03-30.
  16. ^ "dslc.io". dslc.io. Retrieved 2024-08-11.
  17. ^ rfordatascience/tidytuesday, Data Science Learning Community, 2024-08-11, retrieved 2024-08-11
  18. ^ Matloff, Norm (30 September 2019). "An opinionated view of the Tidyverse "dialect" of the R language". GitHub. Retrieved 28 October 2019.
  19. ^ Muenchen, Bob (23 March 2017). "The Tidyverse Curse". r4stats.com.
  20. ^ "The Power of Transitioning to a '-verse' Approach in R Package Development". www.appsilon.com. Retrieved 2024-08-11.
  21. ^ "pharmaverse". pharmaverse.org. Retrieved 2024-08-11.
  22. ^ "Tidyverse packages - Tidyverse". Retrieved 2018-11-26.
  23. ^ "Tidyverse packages". www.tidyverse.org. Retrieved 2020-12-22.
  24. ^ Silge, Julia (2023-02-01), tidytext: Text mining using tidy tools, retrieved 2023-02-03
  25. ^ "Tidymodels". www.tidymodels.org. Retrieved 2023-02-03.
  26. ^ "Tidy Quantitative Financial Analysis". business-science.github.io. Retrieved 2023-02-03.