从 SQL 文本到分布式物理执行计划, 在 StarRocks 中,需要经过以下 5 个步骤:
1、SQL Parse: 将 SQL 文本转换成一个 AST(抽象语法树)
2、SQL Analyze:基于 AST 进行语法和语义分析 => 本文
3、SQL Logical Plan: 将 AST 转换成逻辑计划
4、SQL Optimize:基于关系代数、统计信息、Cost 模型,对逻辑计划进行重写、转换,选择出 Cost “最低” 的物理执行计
5、生成 Plan Fragment:将 Optimizer 选择的物理执行计划转换为 BE 可以直接执行的 Plan Fragment
binder 之后,logicalPlan 包含 metadata
。
- 源码路径:com.starrocks.sql.analyzer.AnalyzeSubqueryTest#testSimple
- sql
select * from (select count(v1) from t0) a
代码追踪
com.starrocks.sql.StatementPlanner#plan(com.starrocks.sql.ast.StatementBase, com.starrocks.qe.ConnectContext)
com.starrocks.sql.analyzer.Analyzer#analyze
com.starrocks.sql.analyzer.Analyzer.AnalyzerVisitor#visitQueryStatement
com.starrocks.sql.analyzer.QueryAnalyzer.Visitor#visitQueryStatement
table
// 一、先解析 Relation
// 二、process(Relation, scope);
// 三、最后解析select 字段
@Override
public Scope visitSelect(SelectRelation selectRelation, Scope scope) {
AnalyzeState analyzeState = new AnalyzeState();
//Record aliases at this level to prevent alias conflicts
Set<TableName> aliasSet = new HashSet<>();
Relation resolvedRelation = resolveTableRef(selectRelation.getRelation(), scope, aliasSet);
if (resolvedRelation instanceof TableFunctionRelation) {
throw unsupportedException("Table function must be used with lateral join");
}
selectRelation.setRelation(resolvedRelation);
Scope sourceScope = process(resolvedRelation, scope); // 新的 scope
sourceScope.setParent(scope);
SelectAnalyzer selectAnalyzer = new SelectAnalyzer(session);
selectAnalyzer.analyze(
analyzeState,
selectRelation.getSelectList(),
selectRelation.getRelation(),
sourceScope,
selectRelation.getGroupByClause(),
selectRelation.getHavingClause(),
selectRelation.getWhereClause(),
selectRelation.getOrderBy(),
selectRelation.getLimit());
selectRelation.fillResolvedAST(analyzeState);
return analyzeState.getOutputScope();
}
binder table
// com.starrocks.sql.analyzer.QueryAnalyzer.Visitor#resolveTableRef
private Relation resolveTableRef(Relation relation, Scope scope, Set<TableName> aliasSet) {
...
} else if (relation instanceof TableRelation) {
TableRelation tableRelation = (TableRelation) relation;
TableName tableName = tableRelation.getName();
...
TableName resolveTableName = relation.getResolveTableName();
// table 填充 catalog、db
MetaUtils.normalizationTableName(session, resolveTableName);
if (aliasSet.contains(resolveTableName)) {
ErrorReport.reportSemanticException(ErrorCode.ERR_NONUNIQ_TABLE,
relation.getResolveTableName().getTbl());
} else {
aliasSet.add(new TableName(resolveTableName.getCatalog(),
resolveTableName.getDb(),
resolveTableName.getTbl()));
}
// 获取 table metadata
Table table = resolveTable(tableRelation.getName());
if (table instanceof View) {
View view = (View) table;
QueryStatement queryStatement = view.getQueryStatement();
ViewRelation viewRelation = new ViewRelation(tableName, view, queryStatement);
viewRelation.setAlias(tableRelation.getAlias());
return viewRelation;
} else {
if (tableRelation.getTemporalClause() != null) {
if (table.getType() != Table.TableType.MYSQL) {
throw unsupportedException(
"unsupported table type for temporal clauses: " + table.getType() +
"; only external MYSQL tables support temporal clauses");
}
}
if (table.isSupported()) {
// binder table
tableRelation.setTable(table);
return tableRelation;
} else {
throw unsupportedException("unsupported scan table type: " + table.getType());
}
}
...
}
private Table resolveTable(TableName tableName) {
try {
MetaUtils.normalizationTableName(session, tableName);
String catalogName = tableName.getCatalog();
String dbName = tableName.getDb();
String tbName = tableName.getTbl();
if (Strings.isNullOrEmpty(dbName)) {
ErrorReport.reportAnalysisException(ErrorCode.ERR_NO_DB_ERROR);
}
if (!GlobalStateMgr.getCurrentState().getCatalogMgr().catalogExists(catalogName)) {
ErrorReport.reportAnalysisException(ErrorCode.ERR_BAD_CATALOG_ERROR, catalogName);
}
Database database = metadataMgr.getDb(catalogName, dbName);
MetaUtils.checkDbNullAndReport(database, dbName);
// 根据 (catalogName, dbName, tbName) 从 metadataMgr 获取 Table
Table table = metadataMgr.getTable(catalogName, dbName, tbName);
if (table == null) {
ErrorReport.reportAnalysisException(ErrorCode.ERR_BAD_TABLE_ERROR, dbName + "." + tbName);
}
if (table.isNativeTableOrMaterializedView() &&
(((OlapTable) table).getState() == OlapTable.OlapTableState.RESTORE
|| ((OlapTable) table).getState() == OlapTable.OlapTableState.RESTORE_WITH_LOAD)) {
ErrorReport.reportAnalysisException(ErrorCode.ERR_BAD_TABLE_STATE, "RESTORING");
}
return table;
} catch (AnalysisException e) {
throw new SemanticException(e.getMessage());
}
}
tableRelation.setTable(table)
设置之前 table 为null,set 后 tableRelation
才真正意义上有语义。
get column
Scope sourceScope = process(resolvedRelation, scope);
=> 当resolvedRelation = TableRelation 时,获取table columns ,构建新的scope
为后续解析准备
// com.starrocks.sql.analyzer.QueryAnalyzer.Visitor#visitTable
public Scope visitTable(TableRelation node, Scope outerScope) {
TableName tableName = node.getResolveTableName();
Table table = node.getTable();
ImmutableList.Builder<Field> fields = ImmutableList.builder();
ImmutableMap.Builder<Field, Column> columns = ImmutableMap.builder();
List<Column> fullSchema = node.isBinlogQuery()
? appendBinlogMetaColumns(table.getFullSchema()) : table.getFullSchema();
List<Column> baseSchema = node.isBinlogQuery()
? appendBinlogMetaColumns(table.getBaseSchema()) : table.getBaseSchema();
for (Column column : fullSchema) {
Field field;
if (baseSchema.contains(column)) {
field = new Field(column.getName(), column.getType(), tableName,
new SlotRef(tableName, column.getName(), column.getName()), true, column.isAllowNull());
} else {
field = new Field(column.getName(), column.getType(), tableName,
new SlotRef(tableName, column.getName(), column.getName()), false, column.isAllowNull());
}
columns.put(field, column);
fields.add(field);
}
// set columns
node.setColumns(columns.build());
String dbName = node.getName().getDb();
session.getDumpInfo().addTable(dbName, table);
...
// 构建新的 scope
Scope scope = new Scope(RelationId.of(node), new RelationFields(fields.build()));
node.setScope(scope);
return scope;
}
function
table
解析并绑定后,开始解析 select 片段
selectAnalyzer.analyze
sql 片段:select count(v1)
代码追踪
com.starrocks.sql.analyzer.SelectAnalyzer#analyze
com.starrocks.sql.analyzer.SelectAnalyzer#analyzeSelect
com.starrocks.sql.analyzer.SelectAnalyzer#analyzeExpression
com.starrocks.sql.analyzer.ExpressionAnalyzer#bottomUpAnalyze
private void bottomUpAnalyze(Visitor visitor, Expr expression, Scope scope) {
if (expression.hasLambdaFunction(expression)) {
analyzeHighOrderFunction(visitor, expression, scope);
} else {
for (Expr expr : expression.getChildren()) {
// children 可能就是slotRef => 字段
bottomUpAnalyze(visitor, expr, scope);
}
}
visitor.visit(expression, scope);
}
当前案例中
expression
=>FunctionCallExpr{count(v1)}
- getChildren =>
SlofRef
,解析并绑定列字段v1,下小节分析
继续看 function 时如何绑定的,以内置函数为例
// com.starrocks.sql.analyzer.ExpressionAnalyzer.Visitor#visitFunctionCall
public Void visitFunctionCall(FunctionCallExpr node, Scope scope) {
Type[] argumentTypes = node.getChildren().stream().map(Expr::getType).toArray(Type[]::new);
if (node.isNondeterministicBuiltinFnName()) {
ExprId exprId = analyzeState.getNextNondeterministicId();
node.setNondeterministicId(exprId);
}
Function fn;
String fnName = node.getFnName().getFunction();
...
// 内置 function 查找
fn = Expr.getBuiltinFunction(fnName, argumentTypes, Function.CompareMode.IS_NONSTRICT_SUPERTYPE_OF);
}
...
// 校验函数参数类型
for (int i = 0; i < fn.getNumArgs(); i++) {
if (!argumentTypes[i].matchesType(fn.getArgs()[i]) &&
!Type.canCastToAsFunctionParameter(argumentTypes[i], fn.getArgs()[i])) {
String msg = String.format("No matching function with signature: %s(%s)", fnName,
node.getParams().isStar() ? "*" :
Arrays.stream(argumentTypes).map(Type::toSql).collect(Collectors.joining(", ")));
throw new SemanticException(msg, node.getPos());
}
}
if (fn.hasVarArgs()) {
Type varType = fn.getArgs()[fn.getNumArgs() - 1];
for (int i = fn.getNumArgs(); i < argumentTypes.length; i++) {
if (!argumentTypes[i].matchesType(varType) &&
!Type.canCastToAsFunctionParameter(argumentTypes[i], varType)) {
String msg = String.format("Variadic function %s(%s) can't support type: %s", fnName,
Arrays.stream(fn.getArgs()).map(Type::toSql).collect(Collectors.joining(", ")),
argumentTypes[i]);
throw new SemanticException(msg, node.getPos());
}
}
}
// binder function 及返回值
node.setFn(fn);
node.setType(fn.getReturnType());
FunctionAnalyzer.analyze(node);
return null;
}
public static Function getBuiltinFunction(String name, Type[] argTypes, Function.CompareMode mode) {
FunctionName fnName = new FunctionName(name);
Function searchDesc = new Function(fnName, argTypes, Type.INVALID, false);
return GlobalStateMgr.getCurrentState().getFunction(searchDesc, mode);
}
public Function getFunction(Function desc, Function.CompareMode mode) {
return functionSet.getFunction(desc, mode);
}
com.starrocks.catalog.FunctionSet
中函数是如何生成的,查询技术内幕|StarRocks 标量函数与聚合函数
若根据函数名和参数类型没有找到对应的函数,报异常
// com.starrocks.sql.analyzer.ExpressionAnalyzer.Visitor#visitFunctionCall
if (fn == null) {
String msg = String.format("No matching function with signature: %s(%s)",
fnName,
node.getParams().isStar() ? "*" : Joiner.on(", ")
.join(Arrays.stream(argumentTypes).map(Type::toSql).collect(Collectors.toList())));
throw new SemanticException(msg, node.getPos());
}
column
在binder function 过程中,需要先解析 Children; count(v1)
中,v1
就是 Children。
// com.starrocks.sql.analyzer.ExpressionAnalyzer.Visitor#visitSlot
public Void visitSlot(SlotRef node, Scope scope) {
// 从scope 中获取 field,在解析table 时,已将fileds 设置进 scope => Scope scope = new Scope(RelationId.of(node), new RelationFields(fields.build()));
ResolvedField resolvedField = scope.resolveField(node);
// binder column
node.setType(resolvedField.getField().getType());
node.setTblName(resolvedField.getField().getRelationAlias());
// help to get nullable info in Analyzer phase
// now it is used in creating mv to decide nullable of fields
node.setNullable(resolvedField.getField().isNullable());
...
handleResolvedField(node, resolvedField);
return null;
}
未绑定之前,SlofRef
只有col、label,没有任何含义,binder 赋予 type
、tableName
、Nullable
属性值,变得有含义。