Do You Coding?

[library] split ๊ตฌํ˜„ํ•˜๊ธฐ ๋ณธ๋ฌธ

CS & Engineering/C

[library] split ๊ตฌํ˜„ํ•˜๊ธฐ

๐Ÿ“Œ ๋งค๋‰ด์–ผ (in subject) 

๋”๋ณด๊ธฐ

Function name

     do_split

 

Prototype

     char **do_split(char const *s, char c);

 

Parameters

     s: The string to be split. / c: The delimiter character.

: s ๋Š” ๋‚˜๋ˆ ์ค„ ๋ฌธ์ž์—ด, c๋Š” ๊ตฌ๋ถ„์ž ๋ฌธ์ž

 

Return value

     The array of new strings resulting from the split. NULL if the allocation fails.

: ๋‚˜๋ˆ ์ง„ ๊ฒฐ๊ณผ ๋ฌธ์ž์—ด์˜ ๋ฐฐ์—ด์„ ๋ฐ˜ํ™˜, ํ• ๋‹น์— ์‹คํŒจํ•˜๋ฉด NULL ๋ฐ˜ํ™˜

 

External functs

     malloc, free

 

Description

     Allocates (with malloc(3)) and returns an array of strings obtained by splitting ’s’ using the character ’c’ as a

     delimiter. The array must end with a NULL pointer.

: ํ• ๋‹นํ•˜๊ณ  ๋ฌธ์ž 'c'๋ฅผ ๊ตฌ๋ถ„ ๊ธฐํ˜ธ๋กœ ์‚ฌ์šฉํ•˜์—ฌ 's' ๋ฅผ ๋ถ„ํ• ํ•ด ์–ป์€ ๋ฌธ์ž์—ด ๋ฐฐ์—ด์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

๋ฐฐ์—ด์€ NULL ํฌ์ธํ„ฐ๋กœ ๋๋‚˜์•ผ ํ•œ๋‹ค.

๐Ÿ“Œ ์ž‘์„ฑ ์ฝ”๋“œ

#include <stdlib.h>
#include <unistd.h>

static char	**do_free(char **s)
{
	int	i;

	i = 0;
	while (s[i])
	{
		free(s[i]);
		i ++;
	}
	free(s);
	return (NULL);
}

static int	do_wordcount(const char *s, char c)
{
	int	i;
	int	result;

	i = 0;
	result = 0;
	while (s[i] != '\0')
	{
		if (s[i] != c && (s[i + 1] == c || s[i + 1] == '\0'))
			result ++;
		i ++;
	}
	return (result);
}

static char	*do_strtok(char *s, char c, int *len, int *del_len)
{
	char	*tok;

	*len = 0;
	*del_len = 0;
	tok = s;
	while (*tok == c)
	{
		(*del_len)++;
		tok ++;
	}
	while (*tok != c && *tok != '\0')
	{
		tok ++;
		(*len)++;
	}
	*tok = '\0';
	return (s);
}

static char	**do_strtok_front(char *s, char c, int wordc)
{
	int		len;
	int		del;
	char	*str;
	char	**result;
	int		i;

	i = 1;
	result = (char **)do_calloc(sizeof(char *), (wordc + 1));
	if (result == NULL)
		return (NULL);
	str = do_strtok(s, c, &len, &del);
	result[0] = (char *)do_calloc(sizeof(char), (len + 1));
	if (result[0] == NULL)
		return (do_free(result));
	do_strlcpy(result[0], str + del, len + 1);
	while (i < wordc)
	{
		str = do_strtok(str + len + del + 1, c, &len, &del);
		result[i] = (char *)do_calloc(sizeof(char), (len + 1));
		if (result[i] == NULL)
			return (do_free(result));
		do_strlcpy(result[i++], str + del, len + 1);
	}
	return (result);
}

char	**do_split(char const *s, char c)
{
	char	**result;
	char	*str;
	int		wordc;

	if (s == NULL)
		return (NULL);
	str = do_strdup(s);
	if (str == NULL)
		return (NULL);
	wordc = do_wordcount(str, c);
	if (wordc == 0)
		result = (char **)do_calloc(sizeof(char *), 1);
	else
		result = do_strtok_front(str, c, wordc);
	if (result == NULL)
		return (NULL);
	free(str);
	return (result);
}

 

๐Ÿ“Œ ์ฝ”๋“œ ๋ฆฌ๋ทฐ

ํŒŒ์ด์ฌ, ์ž๋ฐ” ๋“ฑ์— ์กด์žฌํ•˜๋Š” split ํ•จ์ˆ˜๋ฅผ c์—์„œ ๋งŒ๋“ค์–ด๋ณด์•˜๋‹ค.

๊ธฐ๋ณธ์ ์ธ ๋งค์ปค๋‹ˆ์ฆ˜์€ string.h์— ์กด์žฌํ•˜๋Š” strtok(split๊ณผ ๋น„์Šทํ•œ ๋™์ž‘์„ ํ•˜๋Š” ํ•จ์ˆ˜)์˜ ๊ฐœ๋…์„ ํ† ๋Œ€๋กœ ๋งŒ๋“ค์—ˆ๋‹ค.

๋ฌธ์ž์—ด(s)์„ ๋ฐ›์•„ ํŠน์ •๋ฌธ์ž(c)๋กœ ๋‹จ์–ด๋“ค์„ ๋‚˜๋ˆ„๊ณ , ์ด์ฐจ์› ๋ฐฐ์—ด์— ๊ฐ๊ฐ์˜ ๋‹จ์–ด๋ฅผ ์ €์žฅํ•˜๋Š” ํ•จ์ˆ˜์ด๋‹ค.

 

์šฐ์„ , do_split ์—์„œ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•œ๋‹ค.

const๋กœ ๋ฐ›์•„์˜จ ๋ฌธ์ž์—ด์„ ์ง์ ‘ ์ˆ˜์ •ํ•  ๊ฒƒ์ด๋ฏ€๋กœ, ์ด์ „์— ๋งŒ๋“  do_strdup๋ฅผ ์ด์šฉํ•ด ๋ณต์‚ฌํ•ด ๋„ฃ์–ด์ค„ ๊ฒƒ์ด๋‹ค.

๊ทธ๋ ‡๊ฒŒ ๋ฌธ์ž์—ด str์ด ๋งŒ๋“ค์–ด์กŒ๊ณ , ์ด str์ด ํ• ๋‹น๋˜์ง€ ์•Š์•˜๋‹ค๋ฉด NULL์„ ๋ฐ˜ํ™˜ํ•˜์—ฌ ์˜ˆ์™ธ์ฒ˜๋ฆฌํ•ด์ค€๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  ์ƒˆ๋กœ ๊ตฌํ˜„ํ•œ do_wordcount ํ•จ์ˆ˜๋ฅผ ๊ฐ€์ ธ์™€ wordc์— ๋„ฃ์–ด์ฃผ๋Š”๋ฐ ์ด๋Š” ๊ฐ€์ ธ์˜จ ๋ฌธ์ž์—ด์„ ๊ตฌ๋ถ„์ž๋ฅผ ํ†ตํ•ด

๊ตฌ๋ถ„ํ•˜์—ฌ, word์˜ ์ˆ˜๋ฅผ ํŒŒ์•…ํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ฃผ๋Š” ํ•จ์ˆ˜์ด๋‹ค. ๋ฐ˜ํ™˜์€ 'word ๊ฐœ์ˆ˜'์ด๋‹ค.

๋งŒ์•ฝ, ์ด wordc์˜ ๊ฐ’์ด 0์ด๋ผ๋ฉด, ๋‹จ์–ด๊ฐ€ ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ธ๋ฐ, ์ด ์ƒํ™ฉ์—์„œ๋Š” result ์— do_calloc์œผ๋กœ 1๋งŒํผ์˜ ๊ณต๊ฐ„๋งŒ

ํ• ๋‹นํ•˜์—ฌ, ๋นˆ๋ฌธ์ž์—ด์„ ๋„ฃ์–ด์ค€๋‹ค. ๊ทธ ์™ธ์— wordc๊ฐ€ ์กด์žฌํ•œ๋‹ค๋ฉด, else๋ฅผ ํ†ตํ•ด์„œ do_strtok_front ํ•จ์ˆ˜ ๋™์ž‘์œผ๋กœ ๋„˜์–ด๊ฐ„๋‹ค.

 

do_strtok_front ํ•จ์ˆ˜์—์„œ๋Š” do_strtok๋ฅผ ๋ถˆ๋Ÿฌ์˜ฌ ๋ฉ”์ธ ๋™์ž‘์„ ๋‹ด๋‹นํ•œ๋‹ค.

์šฐ์„  ๋‹จ์–ด๋“ค์„ ๋‹ด๊ธฐ ์œ„ํ•ด, ๋”๋ธ” ํฌ์ธํ„ฐ char ํ˜•์œผ๋กœ do_calloc์„ ํ†ตํ•ด result์— ํ• ๋‹นํ•œ๋‹ค. ํ• ๋‹น์— ์‹คํŒจํ•˜๋ฉด NULL์„ ๋ฐ˜ํ™˜.

๊ทธ๋ฆฌ๊ณ , do_strtok ๋™์ž‘์œผ๋กœ ๋„˜์–ด๊ฐ„๋‹ค.

 

do_strtok ํ•จ์ˆ˜์—์„œ๋Š” ๊ธฐ์กด ๋ฌธ์ž์—ด์˜ ์ฒซ ๋ถ€๋ถ„ ์ฃผ์†Œ๋ฅผ ๊ฐ€์ ธ์™€ tok ์— ๋จผ์ € ํ• ๋‹นํ•œ๋‹ค.

ํ• ๋‹นํ•œ *tok๊ฐ€ c์— ํ•ด๋‹นํ•˜๋Š” ๋ฌธ์ž์— ํ•ด๋‹น๋˜๋ฉด tok์˜ ์ฃผ์†Œ๊ฐ’์„ ์ฆ๊ฐ€์‹œ์ผœ์ฃผ๊ณ ,

del_len์œผ๋กœ ์„ค์ •ํ•œ '๊ตฌ๋ถ„์ž๊ฐ€ ๋‚˜์˜ค๋Š” ๊ธธ์ด ๊ฐ’'๋„ ์ฆ๊ฐ€์‹œ์ผœ์ค€๋‹ค.

-> ์ด๋Š” ๊ตฌ๋ถ„์ž๊ฐ€ ์—ฌ๋Ÿฌ๋ฒˆ ์ด์–ด ์ž‘์„ฑ๋˜์—ˆ์„ ๋•Œ, ๋‹จ์–ด ์ €์žฅ์‹œ ๊ตฌ๋ถ„์ž๊ฐ€ ํฌํ•จ๋˜์–ด ์ €์žฅ๋˜์ง€ ์•Š๋„๋ก ํ•ด์ฃผ๊ธฐ ์œ„ํ•จ์ด๋‹ค.

(do_strtok_front ํ•จ์ˆ˜์—์„œ result์— ์ €์žฅํ•  ๋•Œ + del ์ด ๋˜๋ฉด์„œ ๊ตฌ๋ถ„์ž ๋ถ€๋ถ„์„ ๋นผ์ค€๋‹ค.)

 

๊ทธ๋ฆฌ๊ณ  ๋‹จ์–ด๊ฐ€ ๋‚˜์˜ค๋Š” ๋ถ€๋ถ„ (*tok์ด c๊ฐ€ ์•„๋‹Œ ๋ถ€๋ถ„)๊ณผ ํ•ด๋‹น ์œ„์น˜๊ฐ€ null์ด ์•„๋‹Œ ๊ฐ’์ด ์กด์žฌํ•˜๋Š” ๋™์•ˆ tok ์ฃผ์†Œ๊ฐ’์„ ์ฆ๊ฐ€ํ•˜๋ฉฐ

๋‹จ์–ด ๊ธธ์ด์ธ len๋„ ์ฆ๊ฐ€์‹œ์ผœ์ค€๋‹ค. (์ด๋Š” ์ถ”ํ›„์— ๋™์ ํ• ๋‹น์„ ์œ„ํ•ด len์„ ๊ฐ€์ ธ์˜ค๊ธฐ ์œ„ํ•จ.)

(-> *len ++ ๋กœ ํ•˜๋ฉด ์ฃผ์†Œ๊ฐ€ len์˜ ์ž๋ฃŒํ˜•๋งŒํผ ์ฆ๊ฐ€ ๋˜๋Š”๊ฑฐ๊ณ , (*len) ++ ๋ฅผ ํ•ด์•ผ ํ•ด๋‹น len์˜ ๊ฐ’์ด ์ฆ๊ฐ€๋œ๋‹ค.)

tok ๋กœ ๊ฐ€๋ฆฌํ‚ค๋˜ ๋งˆ์ง€๋ง‰ ๋ถ€๋ถ„์— '\0'๋ฅผ ๋„ฃ์–ด์ฃผ๋ฉด์„œ ๋‹จ์–ด๋ฅผ ๋Š์–ด์ฃผ๊ณ , ๊ธฐ์กด ๋ฌธ์ž์—ด ์ฒซ ์ฃผ์†Œ๋ฅผ ๊ฐ€๋ฆฌํ‚ค๋˜ s๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

 

๊ทธ๋ฆฌ๊ณ  ๋‹ค์‹œ do_strtok_front๋กœ ๋Œ์•„๊ฐ€ result[0] ๋ถ€๋ถ„์„ ๋ฐฉ๊ธˆ ๊ตฌํ•œ len + 1 ๋งŒํผ do_calloc์œผ๋กœ ํ• ๋‹นํ•ด์ฃผ๊ณ ,

์˜ˆ์™ธ์ฒ˜๋ฆฌ ํ›„ do_strlcpy๋กœ ๋‹จ์–ด๋ฅผ copyํ•ด์„œ ๋„ฃ์–ด์ค€๋‹ค. ์—ฌ๊ธฐ์„œ ์ธ์ž๋ฅผ ๋ณด๋ฉด,

result[0]์— ๋ณต์‚ฌํ•˜์—ฌ ๋„ฃ์„ ๊ฒƒ์ด๊ณ , str ์—์„œ del ๋งŒํผ๋งŒ ๋”ํ•ด์ค˜์„œ ์ฒซ ๋‹จ์–ด์˜ ์ฒซ ์œ„์น˜๋ถ€๋ถ„์„ ๊ฐ€๋ฆฌํ‚ค๋„๋ก ํ•˜๊ณ ,

๊ทธ ์œ„์น˜์—์„œ๋ถ€ํ„ฐ len + 1๊นŒ์ง€๋งŒ ์ฝ์–ด์„œ copy ํ•ด์ค€๋‹ค.

 

๊ทธ๋Ÿผ result[0]์—๋Š” ์ฒซ ๋‹จ์–ด๊ฐ€ ์ €์žฅ์ด ๋  ๊ฒƒ์ด๊ณ , ์ด๋Ÿฌํ•œ ๊ณผ์ •๋“ค์„ while์„ ํ†ตํ•ด์„œ wordc ๊นŒ์ง€ ๋ฐ˜๋ณตํ•˜๋ฉด ๋์ด๋‹ค.

while ๋ฌธ์—์„œ๋Š” do_strtok์— str + len + del + 1 ์ด๋ผ๋Š” ์ธ์ž๋ฅผ ๋„ฃ๊ฒŒ ๋˜๋Š”๋ฐ, ๋ฐฉ๊ธˆ ๊ตฌํ•ด์ง„ len๊ณผ del ๊ทธ๋ฆฌ๊ณ  +1 ์„ ํ•ด์„œ

์ฒซ๋ฒˆ์งธ ๋ฌธ์ž๊ฐ€ ๋๋‚œ ๋‹ค์Œ๋ถ€๋ถ„ ๋ถ€ํ„ฐ ์ฝ์–ด๊ฐ€๋„๋ก ํ•œ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ๊ฐ™์€ ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•˜๊ณ , while๋ฌธ์ด ๋Œ ๋•Œ๋งˆ๋‹ค

do_strtok ์ธ์ž์— ๋“ค์–ด๊ฐˆ str์ด  str + len + del + 1 ๋กœ ๋ฐ”๋€Œ์–ด ๋“ค์–ด๊ฐ€๋ฏ€๋กœ, ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ๊ณ„์† ๋ณผ ์ˆ˜ ์žˆ๊ฒŒ ๋œ๋‹ค.

๋ชจ๋“  ๋‹จ์–ด๋ฅผ ์ฝ๊ณ  ๋‚˜๋ฉด result๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

 

do_split์œผ๋กœ ๋Œ์•„์™€์„œ ์˜ˆ์™ธ์ฒ˜๋ฆฌ๋ฅผ ํ•ด์ฃผ๊ณ , free(str)๋กœ str์€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๋น„์›Œ์ค€๋‹ค.

(์ด์ „์— str์„ free ํ•˜์ง€ ์•Š์•„์„œ ๋ฉ”๋ชจ๋ฆฌ ๋ˆ„์ˆ˜๊ฐ€ ์žˆ์—ˆ๋Š”๋ฐ, ์ด๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ alloc ๊ณผ free๊ฐ€ ์ •์ƒ์ž‘๋™ํ•˜์—ฌ leak๊ฐ€ ์—†๊ฒŒ ํ–ˆ๋‹ค.)

๊ทธ๋ฆฌ๊ณ  result๋ฅผ ๋ฐ˜ํ™˜ํ•˜๋ฉด์„œ ๋ชจ๋“  ๋‹จ์–ด๋“ค์ด result์— ๊ฐ๊ฐ ์ €์žฅ๋˜์–ด split์ด ์ˆ˜ํ–‰๋˜์–ด์ง€๋ฉด์„œ ๋๋‚œ๋‹ค.

 

 

 

 

 

 

 

์ฝ”๋“œ๊ฐ€ ์‹ฌํžˆ ๊ธธ์–ด์ง€๊ณ  ๋ถˆํ•„์š”ํ•œ ๋ถ€๋ถ„์ด ๋งŽ์•„์ง„ ์ ์ด ์•„์‰ฝ๊ธฐ ๋•Œ๋ฌธ์— ์ถ”ํ›„์— ์žฌ์ž‘์„ฑํ•ด๋ณผ splitํ•จ์ˆ˜ ์ด๋‹ค.

strtok ๊ฐœ๋…์„ ๊ผญ ํƒ‘์žฌํ•˜๊ณ  ์‹ถ์–ด์„œ, ๋” ์‰ฝ๊ฒŒ ๊ฐˆ ์ˆ˜ ์žˆ๋Š”๋ฐ ๋Œ์•„๊ฐ„ ๋ถ€๋ถ„์ด ์—†์ง€ ์•Š์•„ ์žˆ๋‹ค.

split์˜ ๊ธฐ๋ณธ๋™์ž‘๋งŒ ์ƒ๊ฐํ•˜๊ณ  ์ถ”ํ›„์— ๋‹ค์‹œ ์ž‘์„ฑํ•ด๋ณด๋„๋ก ํ•˜๊ฒ ๋‹ค.