The packages dplyr
uses the SQL concepts of left, right, full, inner, semi, anti and cross joins. The following table gives the correspondence between the Stata commands merge
and joinby
with the dplyr functions:
Stata | merge v1, keep(master matched using) |
dplyr | full_join(df1, df2, by = "v1") |
Stata | merge v1, keep(master matched) |
dplyr | left_join(df1, df2, by = "v1") |
Stata | merge v1, keep(matched using) |
dplyr | right_join(df1, df2, by = "v1") |
Stata | merge v1, keep(matched) |
dplyr | inner_join(df1, df2, by = "v1") |
Stata | merge v1, keep(matched) keepusing(v1) |
dplyr | semi_join(df1, df2, by = "v1") |
Stata | merge v1, keep(master) keepusing(v1) |
dplyr | anti_join(df1, df2) |
Stata | crossby |
dplyr | cross_join(df1, df2) |
Wen there are multiple matches both in the master and using datasets, these functions create all combinations of rows, similarly to Stata joinby
(and contrary to Stata merge m:m
).
To append
one dataset to another:
Stata | append using "using.dta" |
dplyr | row_binds(df1, df2) |
When string columns are append too numeric columns, Stata returns an error (and with the option force
, it replaces the using dataset to missing values).
In contrast, in R, columns with different types are coerced to their higher type (logical < strings < int < real < character).
The package tidyr
includes the functions pivot_longer
and pivot_wider
that alternate between wide and long datasets.
From wide to long
dfwide | –> | dflong |
---|---|---|
id stub1 stub2 1 4.1 4.5 2 4.5 3.0 |
–> | id variable value 1 stub1 4.1 1 stub2 4.5 2 stub1 3.3 2 stub2 3.0 |
Stata | reshape long stub, i(id) j(variable) string |
tidyr | pivot_longer(dfwide, starts_with("stub"), names_to = "variable", values_to = "value") |
The names in red specify the name of new variables. While reshape long
requires to specify both the variable to reshape and the key variables, picot_longer
only requires to specify the variables to reshape - the key variable are inferred to be the remaining ones.
From long to wide
dflong | –> | dfwide |
---|---|---|
id variable value 1 stub1 4.1 1 stub2 4.5 2 stub1 3.3 2 stub2 3.0 |
–> | id stub1 stub2 1 4.1 4.5 2 4.5 3.0 |
Stata | reshape wide value, i(i) j(variable) string |
tidyr | pivot_wider(dflong, names_from = variable, values_from = value) |