null - 程序员宅基地

R语言包_dplyr_2_r语言print(n = )-程序员宅基地

技术标签： r语言

Loading dataset
Choosing columns select rename
Choosing rows filter between slice sample_n top_n distinct
Adding new variables mutate transmute add_rownames
Grouping and counting summarise tally count group_size n_groups ungroup
Creating data frames data_frame
Joining merging tables left_join right_join inner_join full_join semi_join anti_join
Viewing more output print View
参考资料

Loading dataset

# remove flights data if you just finished my previous tutorial
rm(flights)
# load packages
suppressMessages(library(dplyr))
library(nycflights13)

# print the flights dataset from nycflights13
flights

Choosing columns: select, rename

# besides just using select() to pick columns...
flights %>% select(carrier, flight)

# ...you can use the minus sign to hide columns
flights %>% select(-month, -day)
# hide a range of columns
flights %>% select(-(dep_time:arr_delay))

# hide any column with a matching name
flights %>% select(-contains("time"))
# pick columns using a character vector of column names
cols <- c("carrier", "flight", "tailnum")
flights %>% select(one_of(cols))
# select() can be used to rename columns, though all columns not mentioned are dropped
flights %>% select(tail = tailnum)

# rename() does the same thing, except all columns not mentioned are kept
flights %>% rename(tail = tailnum)

Choosing rows: filter, between, slice, sample_n, top_n, distinct

# filter() supports the use of multiple conditions
flights %>% filter(dep_time >= 600, dep_time <= 605)
# between() is a concise alternative for determing if numeric values fall in a range
flights %>% filter(between(dep_time, 600, 605))

# side note: is.na() can also be useful when filtering
flights %>% filter(!is.na(dep_time))
# slice() filters rows by position
flights %>% slice(1000:1005)

# keep the first three rows within each group
flights %>% group_by(month, day) %>% slice(1:3)

# sample three rows from each group
flights %>% group_by(month, day) %>% sample_n(3)

# keep three rows from each group with the top dep_delay
flights %>% group_by(month, day) %>% top_n(3, dep_delay)

# also sort by dep_delay within each group
flights %>% group_by(month, day) %>% top_n(3, dep_delay) %>% arrange(desc(dep_delay))
# unique rows can be identified using unique() from base R
flights %>% select(origin, dest) %>% unique()
# dplyr provides an alternative that is more "efficient"
flights %>% select(origin, dest) %>% distinct()

# side note: when chaining, you don't have to include the parentheses if there are no arguments
flights %>% select(origin, dest) %>% distinct

Adding new variables: mutate, transmute, add_rownames

# mutate() creates a new variable (and keeps all existing variables)
flights %>% mutate(speed = distance/air_time*60)

# transmute() only keeps the new variables
flights %>% transmute(speed = distance/air_time*60)
# example data frame with row names
mtcars %>% head()

# add_rownames() turns row names into an explicit variable
mtcars %>% add_rownames("model") %>% head()

# side note: dplyr no longer prints row names (ever) for local data frames
mtcars %>% tbl_df()

Grouping and counting: summarise, tally, count, group_size, n_groups, ungroup

# summarise() can be used to count the number of rows in each group
flights %>% group_by(month) %>% summarise(cnt = n())
# tally() and count() can do this more concisely
flights %>% group_by(month) %>% tally()
flights %>% count(month)
# you can sort by the count
flights %>% group_by(month) %>% summarise(cnt = n()) %>% arrange(desc(cnt))
# tally() and count() have a sort parameter for this purpose
flights %>% group_by(month) %>% tally(sort=TRUE)
flights %>% count(month, sort=TRUE)
# you can sum over a specific variable instead of simply counting rows
flights %>% group_by(month) %>% summarise(dist = sum(distance))
# tally() and count() have a wt parameter for this purpose
flights %>% group_by(month) %>% tally(wt = distance)
flights %>% count(month, wt = distance)
# group_size() returns the counts as a vector
flights %>% group_by(month) %>% group_size()

# n_groups() simply reports the number of groups
flights %>% group_by(month) %>% n_groups()
# group by two variables, summarise, arrange (output is possibly confusing)
flights %>% group_by(month, day) %>% summarise(cnt = n()) %>% arrange(desc(cnt)) %>% print(n = 40)

# ungroup() before arranging to arrange across all groups
flights %>% group_by(month, day) %>% summarise(cnt = n()) %>% ungroup() %>% arrange(desc(cnt))
flights %>% group_by(month, day) %>% summarise(cnt = n()) %>% arrange(desc(cnt))  %>% filter(month==7)

Creating data frames: data_frame

data_frame() is a better way than data.frame() for creating data frames. Benefits of data_frame():

You can use previously defined columns to compute new columns.
It never coerces column types.
It never munges column names.
It never adds row names.
It only recycles length 1 input.
It returns a local data frame (a tbl_df).

# data_frame() example
x = data_frame(a = 1:6, b = a*2, c = 'string', 'd+e' = 1) %>% glimpse()
# data.frame() example
x1 = data.frame(a = 1:6, c = 'string', 'd+e' = 1) %>% glimpse()

Joining (merging) tables: left_join, right_join, inner_join, full_join, semi_join, anti_join

# create two simple data frames
(a <- data_frame(color = c("green","yellow","red"), num = 1:3))
(b <- data_frame(color = c("green","yellow","pink"), size = c("S","M","L")))

# only include observations found in both "a" and "b" (automatically joins on variables that appear in both tables)
inner_join(a, b)

# include observations found in either "a" or "b"
full_join(a, b)

# include all observations found in "a"
left_join(a, b)

# include all observations found in "b"
right_join(a, b)

# right_join(a, b) is identical to left_join(b, a) except for column ordering
left_join(b, a)

# filter "a" to only show observations that match "b"
semi_join(a, b)

# filter "a" to only show observations that don't match "b"
anti_join(a, b)
# sometimes matching variables don't have identical names
b <- b %>% rename(col = color)

# specify that the join should occur by matching "color" in "a" with "col" in "b"
inner_join(a, b, by=c("color" = "col"))

Viewing more output: print, View

# specify that you want to see more rows
flights %>% print(n = 15)
# specify that you want to see ALL rows (don't run this!)
flights %>% print(n = Inf)
# specify that you want to see all columns
flights %>% print(width = Inf)
# show up to 1000 rows and all columns
flights %>% View()

# set option to see all columns and fewer rows
options(dplyr.width = Inf, dplyr.print_min = 6)

# reset options (or just close R)
options(dplyr.width = NULL, dplyr.print_min = 10)

参考资料

justmarkham的github

本文链接：https://blog.csdn.net/Young_Gy/article/details/48198985

原作者删帖不实内容删帖广告或垃圾文章投诉

智能推荐

hdu 1229 还是A+B（水）-程序员宅基地

文章浏览阅读122次。还是A+BTime Limit: 2000/1000 MS (Java/Others)Memory Limit: 65536/32768 K (Java/Others)Total Submission(s): 24568Accepted Submission(s): 11729Problem Description读入两个小于10000的正整数A和B，计算A+B。...

http客户端Feign——日志配置_feign 日志设置-程序员宅基地

文章浏览阅读419次。HEADERS：在BASIC的基础上，额外记录了请求和响应的头信息。FULL：记录所有请求和响应的明细，包括头信息、请求体、元数据。BASIC：仅记录请求的方法，URL以及响应状态码和执行时间。NONE：不记录任何日志信息，这是默认值。配置Feign日志有两种方式；方式二：java代码实现。注解中声明则代表某服务。方式一：配置文件方式。_feign 日志设置

[转载]将容器管理的持久性 Bean 用于面向服务的体系结构-程序员宅基地

文章浏览阅读155次。将容器管理的持久性 Bean 用于面向服务的体系结构本文将介绍如何使用 IBM WebSphere Process Server 对容器管理的持久性 (CMP) Bean的连接和持久性逻辑加以控制，使其可以存储在非关系数据库..._javax.ejb.objectnotfoundexception: no such entity!

基础java练习题（递归）_java 递归例题-程序员宅基地

文章浏览阅读1.5k次。基础java练习题一、递归实现跳台阶从第一级跳到第n级，有多少种跳法一次可跳一级，也可跳两级。还能跳三级import java.math.BigDecimal;import java.util.Scanner;public class Main{ public static void main(String[]args){ Scanner reader=new Scanner(System.in); while(reader.hasNext()){ _java 递归例题

面向对象程序设计（荣誉）实验一 String_对存储在string数组内的所有以字符‘a’开始并以字符‘e’结尾的单词做加密处理。-程序员宅基地

文章浏览阅读1.5k次，点赞6次，收藏6次。目录1.串应用- 计算一个串的最长的真前后缀题目描述输入输出样例输入样例输出题解2.字符串替换(string)题目描述输入输出样例输入样例输出题解3.可重叠子串 (Ver. I)题目描述输入输出样例输入样例输出题解4.字符串操作（string）题目描述输入输出样例输入样例输出题解1.串应用- 计算一个串的最长的真前后缀题目描述给定一个串，如ABCDAB，则ABCDAB的真前缀有：{ A, AB,ABC, ABCD, ABCDA }ABCDAB的真后缀有：{ B, AB,DAB, CDAB, BCDAB_对存储在string数组内的所有以字符‘a’开始并以字符‘e’结尾的单词做加密处理。

算法设计与问题求解/西安交通大学本科课程MOOC/C_算法设计与问题求解西安交通大学-程序员宅基地

文章浏览阅读68次。西安交通大学/算法设计与问题求解/树与二叉树/MOOC_算法设计与问题求解西安交通大学

随便推点

[Vue warn]: Computed property “totalPrice“ was assigned to but it has no setter._computed property "totalprice" was assigned to but-程序员宅基地

文章浏览阅读1.6k次。问题：在Vue项目中出现如下错误提示：[Vue warn]: Computed property "totalPrice" was assigned to but it has no setter. (found in <Anonymous>)代码：<input v-model="totalPrice"/>原因：v-model命令，因Vue 的双向数据绑定原理，会自动操作 totalPrice，对其进行set 操作而 totalPrice 作为计..._computed property "totalprice" was assigned to but it has no setter.

basic1003-我要通过！13行搞定：也许是全网最奇葩解法_basic 1003 case 1-程序员宅基地

文章浏览阅读60次。十分暴力而简洁的解决方式：读取P和T的位置并自动生成唯一正确答案，将题给测点与之对比，不一样就给我爬！_basic 1003 case 1

服务器浏览war文件,详解将Web项目War包部署到Tomcat服务器基本步骤-程序员宅基地

文章浏览阅读422次。原标题：详解将Web项目War包部署到Tomcat服务器基本步骤详解将Web项目War包部署到Tomcat服务器基本步骤1 War包War包一般是在进行Web开发时，通常是一个网站Project下的所有源码的集合，里面包含前台HTML/CSS/JS的代码，也包含Java的代码。当开发人员在自己的开发机器上调试所有代码并通过后，为了交给测试人员测试和未来进行产品发布，都需要将开发人员的源码打包成Wa..._/opt/bosssoft/war/medical-web.war/web-inf/web.xml of module medical-web.war.

python组成三位无重复数字_python组合无重复三位数的实例-程序员宅基地

文章浏览阅读3k次，点赞3次，收藏13次。# -*- coding: utf-8 -*-# 简述：这里有四个数字，分别是：1、2、3、4#提问：能组成多少个互不相同且无重复数字的三位数？各是多少？def f(n):list=[]count=0for i in range(1,n+1):for j in range(1, n+1):for k in range(1, n+1):if i!=j and j!=k and i!=k:list.a..._python求从0到9任意组合成三位数数字不能重复并输出

ElementUl中的el-table怎样吧0和1改变为男和女_elementui table 性别-程序员宅基地

文章浏览阅读1k次，点赞3次，收藏2次。<el-table-column prop="studentSex" label="性别" :formatter="sex"></el-table-column>然后就在vue的methods中写方法就OK了methods: { sex(row,index){ if(row.studentSex == 1){ return '男'; }else{ return '女'; }..._elementui table 性别

java文件操作之移动文件到指定的目录_java中怎么将pro.txt移动到design_mode_code根目录下-程序员宅基地

文章浏览阅读1.1k次。java文件操作之移动文件到指定的目录_java中怎么将pro.txt移动到design_mode_code根目录下