From cb73e749c0751cd3e08218bb6b6c60725cff6daf Mon Sep 17 00:00:00 2001 From: cyongli Date: Thu, 31 Aug 2017 09:45:08 +0800 Subject: [PATCH] change Cloudera Impala to Apache Impala --- README.md | 4 ++-- gensrc/script/gen_builtins_functions.py | 10 +++++--- gensrc/script/gen_opcodes.py | 30 ++++++++++++++---------- gensrc/script/gen_vector_functions.py | 31 ++++++++++++++----------- 4 files changed, 45 insertions(+), 30 deletions(-) diff --git a/README.md b/README.md index 2e2a31d51c1976..c34cf0687df8ba 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,6 @@ # Introduction to Palo -Palo is an MPP-based interactive SQL data warehousing for reporting and analysis. Palo mainly integrates the technology of Google Mesa and Cloudera Impala. Unlike other popular SQL-on-Hadoop systems, Palo is designed to be a simple and single tightly coupled system, not depending on other systems. Palo not only provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Palo not only provides batch data loading, but also provides near real-time mini-batch data loading. Palo also provides high availability, reliability, fault tolerance, and scalability. The simplicity (of developing, deploying and using) and meeting many data serving requirements in single system are the main features of Palo. +Palo is an MPP-based interactive SQL data warehousing for reporting and analysis. Palo mainly integrates the technology of Google Mesa and Apache Impala. Unlike other popular SQL-on-Hadoop systems, Palo is designed to be a simple and single tightly coupled system, not depending on other systems. Palo not only provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Palo not only provides batch data loading, but also provides near real-time mini-batch data loading. Palo also provides high availability, reliability, fault tolerance, and scalability. The simplicity (of developing, deploying and using) and meeting many data serving requirements in single system are the main features of Palo. ## 1. Background @@ -10,7 +10,7 @@ Prior to Palo, different tools were deployed to solve diverse requirements in ma However, when a use case requires the simultaneous availability of capabilities that cannot all be provided by a single tool, users were forced to build hybrid architectures that stitch multiple tools together. Users often choose to ingest and update data in one storage system, but later reorganize this data to optimize for an analytical reporting use-case served from another. Our users had been successfully deploying and maintaining these hybrid architectures, but we believe that they shouldn’t need to accept their inherent complexity. A storage system built to provide great performance across a broad range of workloads provides a more elegant solution to the problems that hybrid architectures aim to solve. Palo is the solution. Palo is designed to be a simple and single tightly coupled system, not depending on other systems. Palo provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Palo provides bulk-batch data loading, but also provides near real-time mini-batch data loading. Palo also provides high availability, reliability, fault tolerance, and scalability. -Generally speaking, Palo is the technology combination of Google Mesa and Cloudera Impala. Mesa is a highly scalable analytic data storage system that stores critical measurement data related to Google's Internet advertising business. Mesa is designed to satisfy complex and challenging set of users’ and systems’ requirements, including near real-time data ingestion and query ability, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes. Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. At present, by virtue of its superior performance and rich functionality, Impala has been comparable to many commercial MPP database query engine. Mesa can satisfy the needs of many of our storage requirements, however Mesa itself does not provide a SQL query engine; Impala is a very good MPP SQL query engine, but the lack of a perfect distributed storage engine. So in the end we chose the combination of these two technologies. +Generally speaking, Palo is the technology combination of Google Mesa and Apache Impala. Mesa is a highly scalable analytic data storage system that stores critical measurement data related to Google's Internet advertising business. Mesa is designed to satisfy complex and challenging set of users’ and systems’ requirements, including near real-time data ingestion and query ability, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes. Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. At present, by virtue of its superior performance and rich functionality, Impala has been comparable to many commercial MPP database query engine. Mesa can satisfy the needs of many of our storage requirements, however Mesa itself does not provide a SQL query engine; Impala is a very good MPP SQL query engine, but the lack of a perfect distributed storage engine. So in the end we chose the combination of these two technologies. Learning from Mesa’s data model, we developed a distributed storage engine. Unlike Mesa, this storage engine does not rely on any distributed file system. Then we deeply integrate this storage engine with Impala query engine. Query compiling, query execution coordination and catalog management of storage engine are integrated to be frontend daemon; query execution and data storage are integrated to be backend daemon. With this integration, we implemented a single, full-featured, high performance state the art of MPP database, as well as maintaining the simplicity. diff --git a/gensrc/script/gen_builtins_functions.py b/gensrc/script/gen_builtins_functions.py index 5f318d9de2fb31..5bccf70bd43a6f 100644 --- a/gensrc/script/gen_builtins_functions.py +++ b/gensrc/script/gen_builtins_functions.py @@ -11,9 +11,13 @@ // Modifications copyright (C) 2017, Baidu.com, Inc. \n\ // Copyright 2017 The Apache Software Foundation \n\ // \n\ -// Licensed under the Apache License, Version 2.0 (the "License");\n\ -// you may not use this file except in compliance with the License.\n\ -// You may obtain a copy of the License at\n\ +// Licensed to the Apache Software Foundation (ASF) under one \n\ +// or more contributor license agreements. See the NOTICE file \n\ +// distributed with this work for additional information \n\ +// regarding copyright ownership. The ASF licenses this file \n\ +// to you under the Apache License, Version 2.0 (the \n\ +// "License"); you may not use this file except in compliance \n\ +// with the License. You may obtain a copy of the License at \n\ // \n\ // http://www.apache.org/licenses/LICENSE-2.0\n\ // \n\ diff --git a/gensrc/script/gen_opcodes.py b/gensrc/script/gen_opcodes.py index 72df681d6d7fcb..7782ba48428ac9 100755 --- a/gensrc/script/gen_opcodes.py +++ b/gensrc/script/gen_opcodes.py @@ -78,18 +78,24 @@ \n' cc_registry_preamble = '\ -// Copyright 2012 Cloudera Inc.\n\ -//\n\ -// Licensed under the Apache License, Version 2.0 (the "License");\n\ -// you may not use this file except in compliance with the License.\n\ -// You may obtain a copy of the License at\n\ -//\n\ -// http://www.apache.org/licenses/LICENSE-2.0\n\ -//\n\ -// Unless required by applicable law or agreed to in writing, software\n\ -// distributed under the License is distributed on an "AS IS" BASIS,\n\ -// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n\ -// See the License for the specific language governing permissions and\n\ +// Modifications copyright (C) 2017, Baidu.com, Inc. \n\ +// Copyright 2017 The Apache Software Foundation \n\ +// \n\ +// Licensed to the Apache Software Foundation (ASF) under one \n\ +// or more contributor license agreements. See the NOTICE file \n\ +// distributed with this work for additional information \n\ +// regarding copyright ownership. The ASF licenses this file \n\ +// to you under the Apache License, Version 2.0 (the \n\ +// "License"); you may not use this file except in compliance \n\ +// with the License. You may obtain a copy of the License at \n\ +// \n\ +// http://www.apache.org/licenses/LICENSE-2.0\n\ +// \n\ +// Unless required by applicable law or agreed to in writing, software\n\ +// distributed under the License is distributed on an "AS IS" BASIS,\n\ +// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n\ +// See the License for the specific language governing permissions and\n\ +// limitations under the License.\n\ // limitations under the License.\n\ \n\ // This is a generated file, DO NOT EDIT.\n\ diff --git a/gensrc/script/gen_vector_functions.py b/gensrc/script/gen_vector_functions.py index 04963c9ae93de5..fab89836ec6989 100755 --- a/gensrc/script/gen_vector_functions.py +++ b/gensrc/script/gen_vector_functions.py @@ -442,19 +442,24 @@ class VectorComputeFunctions {\n\ python_preamble = '\ #!/usr/bin/env python\n\ -# Copyright 2012 Cloudera Inc.\n\ -#\n\ -# Licensed under the Apache License, Version 2.0 (the "License");\n\ -# you may not use this file except in compliance with the License.\n\ -# You may obtain a copy of the License at\n\ -#\n\ -# http://www.apache.org/licenses/LICENSE-2.0\n\ -#\n\ -# Unless required by applicable law or agreed to in writing, software\n\ -# distributed under the License is distributed on an "AS IS" BASIS,\n\ -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n\ -# See the License for the specific language governing permissions and\n\ -# limitations under the License.\n\ +# Modifications copyright (C) 2017, Baidu.com, Inc. \n\ +# Copyright 2017 The Apache Software Foundation \n\ +# \n\ +# Licensed to the Apache Software Foundation (ASF) under one \n\ +# or more contributor license agreements. See the NOTICE file \n\ +# distributed with this work for additional information \n\ +# regarding copyright ownership. The ASF licenses this file \n\ +# to you under the Apache License, Version 2.0 (the \n\ +# "License"); you may not use this file except in compliance \n\ +# with the License. You may obtain a copy of the License at \n\ +# \n\ +# http://www.apache.org/licenses/LICENSE-2.0\n\ +# \n\ +# Unless required by applicable law or agreed to in writing, software\n\ +# distributed under the License is distributed on an "AS IS" BASIS,\n\ +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n\ +# See the License for the specific language governing permissions and\n\ +# limitations under the License.\n\ \n\ # This is a generated file, DO NOT EDIT IT.\n\ # To add new functions, see impala/common/function-registry/gen_opcodes.py\n\