A data frame is said to be folded when some cells contain
multiple elements. These are often encode as a semi-colon
separated character , such as "a;b"
. This function will
transform the data frame to that "a"
and "b"
are split and
recorded across two lines.
The simple example below illustrates a trivial case, where the table below
X | Y |
1 | a;b |
2 | c |
is unfolded based on the Y variable and becomes
X | Y |
1 | a |
1 | b |
2 | c |
where the value 1 of variable X is now duplicated.
If there is a second variable that follows the same pattern as the one used to unfold the table, it also gets unfolded.
X | Y | Z |
1 | a;b | x;y |
2 | c | z |
becomes
X | Y | Z |
1 | a | x |
1 | b | y |
2 | c | z |
because it is implied that the element in "a;b" are match to "x;y" by their respective indices. Note in the above example, unfolding by Y or Z produces the same result.
However, the following table unfolded by Y
X | Y | Z |
1 | a;b | x;y |
2 | c | x;y |
produces
X | Y | Z |
1 | a | x;y |
1 | b | x;y |
2 | c | x;y |
because "c" and "x;y" along the second row don't match. In this case, unfolding by Z would produce a different result. These examples are also illustrated below.
Note that there is no foldDataFrame()
function. See
reduceDataFrame()
and expandDataFrame()
to flexibly encode and
handle vectors of length > 1 within cells.
unfoldDataFrame(x, k, split = ";")
A DataFrame
or data.frame
to be unfolded.
character(1)
referring to a character variable in x
,
that will be used to unfold x
.
character(1)
passed to strsplit()
to split
x[[k]]
.
A new object unfolded object of class class(x)
with
numbers of rows >= nrow(x)
and columns identical to x
.
(x0 <- DataFrame(X = 1:2, Y = c("a;b", "c")))
#> DataFrame with 2 rows and 2 columns
#> X Y
#> <integer> <character>
#> 1 1 a;b
#> 2 2 c
unfoldDataFrame(x0, "Y")
#> DataFrame with 3 rows and 2 columns
#> X Y
#> <integer> <character>
#> 1 1 a
#> 2 1 b
#> 3 2 c
(x1 <- DataFrame(X = 1:2, Y = c("a;b", "c"), Z = c("x;y", "z")))
#> DataFrame with 2 rows and 3 columns
#> X Y Z
#> <integer> <character> <character>
#> 1 1 a;b x;y
#> 2 2 c z
unfoldDataFrame(x1, "Y")
#> DataFrame with 3 rows and 3 columns
#> X Y Z
#> <integer> <character> <character>
#> 1 1 a x
#> 2 1 b y
#> 3 2 c z
unfoldDataFrame(x1, "Z") ## same
#> DataFrame with 3 rows and 3 columns
#> X Y Z
#> <integer> <character> <character>
#> 1 1 a x
#> 2 1 b y
#> 3 2 c z
(x2 <- DataFrame(X = 1:2, Y = c("a;b", "c"), Z = c("x;y", "x;y")))
#> DataFrame with 2 rows and 3 columns
#> X Y Z
#> <integer> <character> <character>
#> 1 1 a;b x;y
#> 2 2 c x;y
unfoldDataFrame(x2, "Y")
#> DataFrame with 3 rows and 3 columns
#> X Y Z
#> <integer> <character> <character>
#> 1 1 a x;y
#> 2 1 b x;y
#> 3 2 c x;y
unfoldDataFrame(x2, "Z") ## different
#> DataFrame with 4 rows and 3 columns
#> X Y Z
#> <integer> <character> <character>
#> 1 1 a;b x
#> 2 1 a;b y
#> 3 2 c x
#> 4 2 c y